Go to main content

Managing Devices in Oracle® Solaris 11.4

Exit Print View

Updated: November 2020
 
 

Monitoring and Troubleshooting IB Devices

The Oracle Solaris 11.4 release provide commands and utilities that enable you to manage the IB fabric more effectively. These commands are included in the system/io/infiniband/open-fabrics package.

The commands and utilities described in the following table enable you to list and query IB devices.

Table 1  General IB Monitoring Commands
Command
Description
ibv_asyncwatch
Monitors InfiniBand asynchronous events
ibv_devices or ibv_devinfo
Lists InfiniBand devices or device information
ibv_rc_pingpong, ibv_srq_pingpong, or ibv_ud_pingpong
Tests node to node connectivity by using an RC connection, SRQs, or a UD connection
mckey
Tests the RDMA CM multicast setup and simple data transfer
rping
Tests the RDMA CM connection and attempts an RDMA ping-pong test
ucmatose
Tests RDMA CM connection and attempts a simple ping-pong test
udaddy
Tests RDMA CM datagram setup and attempts a simple ping-pong test

The following table describes commands used for general IB performance testing.

Table 2  General IB Performance Testing Commands
Command
Description
rdma_bw or rdma_lat
Tests RDMA write transactions for streaming bandwidth or latency.
ib_read_bw or ib_read_lat
Tests RDMA read transactions for bandwidth or latency.
ib_send_bw or ib_send_lat
Tests RDMA send transactions for bandwidth or latency.
ib_write_bw or ib_write_bw_postlist
Tests RDMA write transactions for bandwidth that displays one I/O request at a time or post list bandwidth that displays a list of I/O requests.
ib_write_lat
Tests RDMA write transactions for latency.
ib_clock_test
Tests the accuracy of the system clock
qperf
Measures socket and RDMA performance

The following table describes RDS monitoring and testing tools.

Table 3  RDS Monitoring and Testing Tools
Command
Description
rds-info
Displays RDS kernel module information
rds-ping
Determines whether the remote node over RDS is reachable
rds-stress
Sends a message between processes over RDS sockets

Note that RDSv3 does not support unconfiguring HCAs. If the system has an RDSv3 driver installed at the time of DR, unconfiguring the HCA fails with an error message such as the following example for ib::rdsv3,0.

# cfgadm -c unconfigure ib::rdsv3,0
This operation will suspend activity on the IB device
Continue (yes/no)? yes
cfgadm: Hardware specific failure: unconfigure operation 
failed ap_id: /devices/ib:fabric::rdsv3,0

# cfgadm -c unconfigure PCI-EM0
cfgadm: Component system is busy, try again: unconfigure failed

Workaround:

Remove the RDSv3 driver and reboot the system before performing the HCA DR operation.

# rem_drv rdsv3
Device busy
Cannot unload module: rdsv3
Will be unloaded upon reboot.

# init 6

The following table describes fabric diagnostic tools.

Table 4  Fabric Diagnostic Tools
Command
Description
ibdiagnet
Performs a diagnostic check of the entire fabric
ibaddr
Queries an InfiniBand address or addresses
ibnetdiscover
Discovers remote InfiniBand topology
ibping
Validates connectivity between IB nodes
ibportstate
Queries the physical port state and link speed of an IB port
ibroute
Displays InfiniBand switch forwarding tables
ibstat or ibsysstat
Queries the status of an InfiniBand device or devices or the status of a system on an IB address
ibtracert
Traces an IB path
perfquery or saquery
Queries IB port counters or sIB subnet administration attributes
sminfo
Queries the IB SMInfo attribute
smpquery or smpdump
Queries or dumps IB subnet management attributes
ibcheckerrors or ibcheckerrs
Validates the IB port (or node) or IB subnet and reports errors
ibchecknet, ibchecknode, or ibcheckport
Validates the IB subnet, node or port and reports errors
ibcheckportstate, ibcheckportwidth, ibcheckstate, or ibcheckwidth
Validates IB ports that are linked up but not active, ports for 1x (2.0 Gbps) link width, ports in the IB subnet that are linked up but not active, or lx links in the IB subnet
ibclearcountersibclearerrors or ibclearerrors
Clears port counters or error counters in the IB subnet
ibdatacounters, or ibdatacounts
Queries for data counters in the IB subnet or IB port data counters
ibdiscover.pl
Annotates and compares IB topology
ibhosts
Displays IB host nodes in the IB topology
iblinkinfo.pl or iblinkinfo
Displays link information for all links in the fabric
ibnodes
Displays IB nodes in the topology
ibprintca.pl
Displays either the CA specified or the list of CAs from the ibnetdiscover output
ibprintrt.pl
Displays either the specified router or a list of routers from the ibnetdiscover output
ibprintswitch.pl
Displays either the specified switch or a list of switches from the ibnetdiscover output
ibqueryerrors.pl
Queries and reports non-zero IB port counters
ibrouters
Displays IB router nodes in the topology
ibstatus
Queries the basic status of IB devices
ibswitches
Displays IB switch nodes in the topology
ibswportwatch.pl
Polls the counters on the specified switch or port and reports the rate of change information
set_nodedesc.sh
Sets or displays the node description string for IB Host Controller Adapters (HCAs)
dump2psl.pl
Dumps the PSL file based on the opensm output file that is used for credit loop checking
dump2slvl.pl
Dumps the SLVL file based on the opensm output file that is used for credit loop checking
ibis
An extended TCL shell for IB management in-band services

Note -  The fabric diagnostic tools mentioned in the table are not supported from virtual functions (VFs).