Monitoring and Troubleshooting IB Devices
The Oracle Solaris 11.4 release provide commands and utilities that enable you to
manage the IB fabric more effectively. These commands are included
in the system/io/infiniband/open-fabrics
package.
The commands and utilities described in the following table enable you to list and query IB devices.
Table 1 General IB Monitoring Commands
|
|
ibv_asyncwatch
|
Monitors InfiniBand asynchronous events
|
ibv_devices or ibv_devinfo
|
Lists InfiniBand devices or device information
|
ibv_rc_pingpong, ibv_srq_pingpong, or ibv_ud_pingpong
|
Tests node to node connectivity by using an RC connection, SRQs, or a UD connection
|
mckey
|
Tests the RDMA CM multicast setup and simple data transfer
|
rping
|
Tests the RDMA CM connection and attempts an RDMA ping-pong test
|
ucmatose
|
Tests RDMA CM connection and attempts a simple ping-pong test
|
udaddy
|
Tests RDMA CM datagram setup and attempts a simple ping-pong test
|
|
The following table describes commands used for general IB performance testing.
Table 2 General IB Performance Testing Commands
|
|
rdma_bw or rdma_lat
|
Tests RDMA write transactions for streaming bandwidth or latency.
|
ib_read_bw or ib_read_lat
|
Tests RDMA read transactions for bandwidth or latency.
|
ib_send_bw or ib_send_lat
|
Tests RDMA send transactions for bandwidth or latency.
|
ib_write_bw or ib_write_bw_postlist
|
Tests RDMA write transactions for bandwidth that displays one I/O request at a time or post list bandwidth that displays a list of I/O requests.
|
ib_write_lat
|
Tests RDMA write transactions for latency.
|
ib_clock_test
|
Tests the accuracy of the system clock
|
qperf
|
Measures socket and RDMA performance
|
|
The following table describes RDS monitoring and testing tools.
Table 3 RDS Monitoring and Testing Tools
|
|
rds-info
|
Displays RDS kernel module information
|
rds-ping
|
Determines whether the remote node over RDS is reachable
|
rds-stress
|
Sends a message between processes over RDS sockets
|
|
Note that RDSv3 does not support unconfiguring HCAs. If the system has an RDSv3 driver installed at the time of DR, unconfiguring the HCA fails with an error message such as the following example for ib::rdsv3,0.
# cfgadm -c unconfigure ib::rdsv3,0
This operation will suspend activity on the IB device
Continue (yes/no)? yes
cfgadm: Hardware specific failure: unconfigure operation
failed ap_id: /devices/ib:fabric::rdsv3,0
# cfgadm -c unconfigure PCI-EM0
cfgadm: Component system is busy, try again: unconfigure failed
Workaround:
Remove the RDSv3 driver and reboot the system before performing the HCA DR operation.
# rem_drv rdsv3
Device busy
Cannot unload module: rdsv3
Will be unloaded upon reboot.
# init 6
The following table describes fabric diagnostic tools.
Table 4 Fabric Diagnostic Tools
|
|
ibdiagnet
|
Performs a diagnostic check of the entire fabric
|
ibaddr
|
Queries an InfiniBand address or addresses
|
ibnetdiscover
|
Discovers remote InfiniBand topology
|
ibping
|
Validates connectivity between IB nodes
|
ibportstate
|
Queries the physical port state and link speed of an IB port
|
ibroute
|
Displays InfiniBand switch forwarding tables
|
ibstat or ibsysstat
|
Queries the status of an InfiniBand device or devices or the status of a system on an IB address
|
ibtracert
|
Traces an IB path
|
perfquery or saquery
|
Queries IB port counters or sIB subnet administration attributes
|
sminfo
|
Queries the IB SMInfo attribute
|
smpquery or smpdump
|
Queries or dumps IB subnet management attributes
|
ibcheckerrors or ibcheckerrs
|
Validates the IB port (or node) or IB subnet and reports errors
|
ibchecknet, ibchecknode, or ibcheckport
|
Validates the IB subnet, node or port and reports errors
|
ibcheckportstate, ibcheckportwidth, ibcheckstate, or ibcheckwidth
|
Validates IB ports that are linked up but not active, ports for 1x (2.0 Gbps) link width, ports in the IB subnet that are linked up but not active, or lx links in the IB subnet
|
ibclearcountersibclearerrors or ibclearerrors
|
Clears port counters or error counters in the IB subnet
|
ibdatacounters, or ibdatacounts
|
Queries for data counters in the IB subnet or IB port data counters
|
ibdiscover.pl
|
Annotates and compares IB topology
|
ibhosts
|
Displays IB host nodes in the IB topology
|
iblinkinfo.pl or iblinkinfo
|
Displays link information for all links in the fabric
|
ibnodes
|
Displays IB nodes in the topology
|
ibprintca.pl
|
Displays either the CA specified or the list of CAs from the ibnetdiscover output
|
ibprintrt.pl
|
Displays either the specified router or a list of routers from the ibnetdiscover output
|
ibprintswitch.pl
|
Displays either the specified switch or a list of switches from the ibnetdiscover output
|
ibqueryerrors.pl
|
Queries and reports non-zero IB port counters
|
ibrouters
|
Displays IB router nodes in the topology
|
ibstatus
|
Queries the basic status of IB devices
|
ibswitches
|
Displays IB switch nodes in the topology
|
ibswportwatch.pl
|
Polls the counters on the specified switch or port and reports the rate of change information
|
set_nodedesc.sh
|
Sets or displays the node description string for IB Host Controller Adapters (HCAs)
|
dump2psl.pl
|
Dumps the PSL file based on the opensm output file that is used for credit loop checking
|
dump2slvl.pl
|
Dumps the SLVL file based on the opensm output file that is used for credit loop checking
|
ibis
|
An extended TCL shell for IB management in-band services
|
|
Note -
The fabric diagnostic tools mentioned in the table are not supported from virtual functions (VFs).