- Infiniband IPoIB device driver
The ibd driver implements the IETF IP over Infiniband protocol and provides IPoIB service for all IBA ports present in the system.
The ibd driver is a multi-threaded, loadable, clonable, STREAMS hardware driver supporting the connection-less Data Link Provider Interface, dlpi(7P). The ibd driver provides basic support for both the IBA Unreliable Datagram Queue Pair hardware and the IBA Reliable Connected Queue Pair hardware. Functions include QP initialization, frame transmit and receive, multicast and promiscuous mode support, and statistics reporting.
By default, datagram mode is used by each ibd instance, unless the enable_rc is set to 1 for that instance in the .conf file. This change can be made on a per instance basis by changing the corresponding value of the variable. So the Nth value of enable_rc changes the setting for the Nth instance of ibd. Any value other than 1, or no .conf file at all is equivalent to specifying datagram mode.
Because ibd over connected mode attempts to use a large MTU (65520 bytes), your application should adapt to the large MTU to get better performance. For example, you should adopt a large TCP window size.
Use the cloning, character-special device /dev/ibd to access all ibd devices installed within the system. The ibd driver is dependent on GLD, a loadable kernel module that provides the ibd driver with the DLPI and STREAMS functionality required of a LAN driver. Except as noted in the Application Programming Interface section of this manual page, see gld(7D), for more details on the primitives supported by the driver. The GLD module is located at /kernel/misc/sparcv9/gld on 64 bit systems and at/kernel/misc/gld on 32 bit systems.
The ibd driver expects certain configuration of the IBA fabric prior to operation (which also implies the SM must be active and managing the fabric). Specifically, the IBA multicast group representing the IPv4 limited broadcast address 255.255.255.255 (also defined as broadcast-GID in IETF documents) should be created prior to initializing the device. IBA properties (including mtu, qkey and sl) of this group is used by the driver to create any other IBA multicast group as instructed by higher level (IP) software. The driver probes for the existence of this broadcast-GID during attach(9E).
The values returned by the driver in the DL_INFO_ACK primitive in response to your DL_INFO_REQ are:
Maximum SDU is the MTU associated with the broadcast-GID group, less the 4 byte IPoIB header.
Minimum SDU is 0.
dlsap address length is 22.
MAC type is DL_IB.
The sap length value is -2, meaning the physical address component is followed immediately by a 2-byte sap component within the DLSAP address.
Broadcast address value is the MAC address consisting of the 4 bytes of QPN 00:FF:FF:FF prepended to the IBA multicast address of the broadcast-GID.
Due to the nature of link address definition for IPoIB, the DL_SET_PHYS_ADDR_REQ DLPI primitive is not supported.
In the transmit case for streams that have been put in raw mode via the DLIOCRAW ioctl, the DLPI application must prepend the 20 byte IPoIB destination address to the data it wants to transmit over-the-wire. In the receive case, applications receive the IP/ARP datagram along with the IETF defined 4 byte header.
This section describes warning messages that might be generated by the driver. Please note that while the format of these messages can be modified in future versions, the same general information is provided.
While joining IBA multicast groups corresponding to IP multicast groups as part of multicast promiscuous operations as required by IP multicast routers, or as part of running snoop(1M), it is possible that joins to some multicast groups can fail due to inherent resource constraints in the IBA components. In such cases, warning message similar to the following appear in the system log, indicating the interface on which the failure occurred:
NOTICE: ibd0: Could not get list of IBA multicast groups NOTICE: ibd0: IBA promiscuous mode missed multicast group NOTICE: ibd0: IBA promiscuous mode missed new multicast gid
Also, if the IBA SM indicates that multicast trap support is suspended or unavailable, the system log contains a message similar to:
NOTICE: ibd0: IBA multicast support degraded due to unavailability of multicast traps
When the SM indicates trap support is restored:
NOTICE: ibd0: IBA multicast support restored due to availability of multicast traps
Additionally, if the IBA link transitions to an unavailable state (that is, the IBA link state becomes Down, Initialize or Armed) and then becomes active again, the driver tries to rejoin previously joined groups if required. Failure to rejoin multicast groups triggers messages like:
NOTICE: ibd0: Failure on port up to rejoin multicast gid
If the corresponding HCA port is in the unavailable state defined above when initializing an ibd interface using ifconfig(1M), a message is emitted by the driver:
NOTICE: ibd0: Port is not active
Further, as described above, if the broadcast-GID is not found, or the associated MTU is higher than what the HCA port can support, the following messages are printed to the system log:
NOTICE: ibd0: IPoIB broadcast group absent NOTICE: ibd0: IPoIB broadcast group MTU 4096 greater than port's maximum MTU 2048
In all cases of these reported problems when running ifconfig(1M), it should be checked that IBA cabling is intact, an SM is running on the fabric, and the broadcast-GID with appropriate properties has been created in the IBA partition.
The MTU of Reliable Connected mode can be larger than the MTU of Unreliable Datagram mode.
When Reliable Connected mode is enabled, ibd still uses Unreliable Datagram mode to transmit and receive multicast packets. If the payload size (excluding 4 byte IPoIB header) of a multicast packet is larger than the IP link MTU specified by the broadcast group, ibd drops it. A message appears in the system log when drops occur:
NOTICE: ibd0: Reliable Connected mode is on. Multicast packet length (<packet length> ><IP_LINK_MTU>) is too long to send
If only one side has enabled Reliable Connected mode, communication falls back to datagram mode. The connected mode instance uses Path MTU discovery to automatically adjust the MTU of a unicast packet if an MTU difference exists. Before Path MTU discovery reduces the MTU for a specific destination, several packets which's size exceed the MTU of Unreliable Datagram mode is dropped.
The IPoIB service comes preconfigured on all HCA ports in the system. To turn the service off, or back on after turning it off, refer to documentation in cfgadm_ib(1M).
Example 1 An Example Driver .conf File
# 1: unicast packets will be sent over Reliable Connected Mode # 0: unicast packets will be sent over Unreliable Datagram Mode # # Each element in the list below maps to the corresponding ibd # instance; the first element is for ibd instance 0, the second # element is for instance 1 and so on. # enable_rc=1,1,0,0;
This example driver .conf file enables Connected Mode for ibd instances 0 and 1. Instances 2 and 3 use datagram mode.
Special character device
Configuration file to start IPoIB service
Configuration file of IPoIB driver
64–bit SPARC device driver
64–bit x86 device driver
32–bit x86 device driver
IBD is a GLD-based driver and provides the statistics described by gld(7D). Valid received packets not accepted by any stream (long) increase when IBD transmits broadcast IP packets. This happens because the infiniband hardware copies and loops back the transmitted broadcast packets to the source. These packets are discarded by GLD and are recorded as unknowns.