Go to main content

Managing Devices in Oracle® Solaris 11.3

Exit Print View

Updated: April 2018
 
 

Monitoring and Troubleshooting IB Devices

In the Oracle Solaris 11 release, new commands and utilities enable you to manage the IB fabric more effectively. These commands are included in the system/io/infiniband/open-fabrics package and the man pages are installed automatically when the open-fabrics package is installed. For example:

% man rping
Reformatting page.  Please Wait... done

librdmacm                                                RPING(1)

NAME
rping - RDMA CM connection and RDMA ping-pong test.

SYNOPSIS
rping -s [-v] [-V] [-d] [-P] [-a address] [-p port]
[-C message_count] [-S message_size]
rping -c [-v] [-V] [-d] -a address [-p port]
[-C message_count] [-S message_size]
.
.
.

The following new commands and utilities provide the ability to list and query IB devices, diagnose and trouble shoot IB fabric issues, and measure IB performance.

Table 7  General IB Monitoring Commands
Command
Description
ibv_asyncwatch
Monitors InfiniBand asynchronous events
ibv_devices or ibv_devinfo
Lists InfiniBand devices or device information
ibv_rc_pingpong, ibv_srq_pingpong, or ibv_ud_pingpong
Tests node to node connectivity by using RC connection, SRQs, or UD connection
mckey
Tests RDMA CM multicast setup and simple data transfer
rping
Tests RDMA CM connection and attempts RDMA ping-pong
ucmatose
Tests RDMA CM connection and attempts simple ping-pong
udaddy
Tests RDMA CM datagram setup and attempts simple ping-pong
Table 8  General IB Performance Testing Commands
Command
Description
rdma_bw or rdma_lat
Tests RDMA write transactions for streaming bandwidth or latency.
ib_read_bw or ib_read_lat
Tests RDMA read transactions for bandwidth or latency.
ib_send_bw or ib_send_lat
Tests RDMA send transactions for bandwidth or latency.
ib_write_bw or ib_write_bw_postlist
Tests RDMA write transactions for bandwidth that displays one I/O request at a time or post list bandwidth that displays a list of I/O requests.
ib_write_lat
Tests RDMA write transactions for latency.
ib_clock_test
Tests accuracy of system clock
qperf
Measures socket and RDMA performance
Table 9  RDS Monitoring and Testing Tools
Command
Description
rds-info
Displays RDS kernel module information
rds-ping
Determines if remote node over RDS is reachable
rds-stress
Sends message between processes over RDS sockets

Note that RDSv3 does not support unconfiguring HCAs. If the system has an RDSv3 driver installed at the time of DR, unconfiguring the HCA fails with an error message such as the following example for ib::rdsv3,0.

# cfgadm -c unconfigure ib::rdsv3,0
This operation will suspend activity on the IB device
Continue (yes/no)? yes
cfgadm: Hardware specific failure: unconfigure operation 
failed ap_id: /devices/ib:fabric::rdsv3,0

# cfgadm -c unconfigure PCI-EM0
cfgadm: Component system is busy, try again: unconfigure failed

Workaround:

Remove the RDSv3 driver and reboot the system before performing the HCA DR operation.

# rem_drv rdsv3
Device busy
Cannot unload module: rdsv3
Will be unloaded upon reboot.

# init 6
Table 10  Fabric Diagnostic Tools
Command
Description
ibdiagnet
Performs diagnostic check of the entire fabric
ibaddr
Queries InfiniBand address or addresses
ibnetdiscover
Discovers remote InfiniBand topology
ibping
Validates connectivity between IB nodes
ibportstate
Queries physical port state and link speed of an IB port
ibroute
Displays InfiniBand switch forwarding tables
ibstat or ibsysstat
Query status of InfiniBand device or devices or the status of a system on an IB address
ibtracert
Traces an IB path
perfquery or saquery
Queries IB port counters or sIB subnet administration attributes
sminfo
Queries IB SMInfo attribute
smpquery or smpdump
Queries or dumps IB subnet management attributes
ibcheckerrors or ibcheckerrs
Validates IB port (or node) or IB subnet and reports errors
ibchecknet, ibchecknode, or ibcheckport
Validates IB subnet, node, or port and reports errors
ibcheckportstate, ibcheckportwidth, ibcheckstate, or ibcheckwidth
Validates IB port that are link up but not active, ports for 1x (2.0 Gbps) link width, ports in IB subnet that are link up but not active, or lx links in IB subnet
ibclearcountersibclearerrors or ibclearerrors
Clears port counters or error counters in IB subnet
ibdatacountersibdatacounts, or ibdatacounts
Queries for data counters in IB subnet or IB port data counters
ibdiscover.pl
Annotates and compares IB topology
ibhosts
Displays IB host nodes in topology
iblinkinfo.pl or iblinkinfo
Displays link information for all links in the fabric
ibnodes
Displays IB nodes in topology
ibprintca.pl
Displays either the CA specified or the list of CAs from the ibnetdiscover output
ibprintrt.pl
Displays either only the router specified or a list of routers from the ibnetdiscover output
ibprintswitch.pl
Displays either the switch specified or a list of switches from the ibnetdiscover output
ibqueryerrors.pl
Queries and report non-zero IB port counters
ibrouters
Displays IB router nodes in topology
ibstatus
Queries basic status of IB devices
ibswitches
Displays IB switch nodes in topology
ibswportwatch.pl
Polls the counters on the specified switch or port and report rate of change information
set_nodedesc.sh
Sets or displays node description string for IB Host Controller Adapters (HCA)s
dump2psl.pl
Dumps PSL file based on opensm output file that is used for credit loop checking
dump2slvl.pl
Dumps SLVL file based on opensm output file that is used for credit loop checking
ibis
An extended TCL shell for IB management inband services

Note -  The fabric diagnostic tools mentioned in the table are not supported from virtual functions (VFs).