System Administration Guide: IP Services

Probe-Based Failure Detection

The in.mpathd daemon performs probe-based failure detection on each interface in the IPMP group that has a test address. Probe-based failure detection involves the sending and receiving of ICMP probe messages that use test addresses. These messages go out over the interface to one or more target systems on the same IP link. For an introduction to test addresses, refer to Test Addresses. For information on configuring test addresses, refer to How to Configure an IPMP Group With Multiple Interfaces.

The in.mpathd daemon determines which target systems to probe dynamically. Routers that are connected to the IP link are automatically selected as targets for probing. If no routers exist on the link, in.mpathd sends probes to neighbor hosts on the link. A multicast packet that is sent to the all hosts multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, determines which hosts to use as target systems. The first few hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd cannot find routers or hosts that responded to the ICMP echo packets, in.mpathd cannot detect probe-based failures.

You can use host routes to explicitly configure a list of target systems to be used by in.mpathd. For instructions, refer to Configuring Target Systems.

To ensure that each interface in the IPMP group functions properly, in.mpathd probes all the targets separately through all the interfaces in the IPMP group. If no replies are made in response to five consecutive probes, in.mpathd considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. However, you can tune the failure detection time in the /etc/default/mpathd file. For instructions, go to How to Configure the /etc/default/mpathd File.

For a repair detection time of 10 seconds, the probing rate is approximately one probe every two seconds. The minimum repair detection time is twice the failure detection time, 20 seconds by default, because replies to 10 consecutive probes must be received. The failure and repair detection times apply only to probe-based failure detection.

Note –

In an IPMP group that is composed of VLANs, link-based failure detection is implemented per physical-link and thus affects all VLANs on that link. Probe-based failure detection is performed per VLAN-link. For example, bge0/bge1 and bge1000/bge1001 are configured together in a group. If the cable for bge0 is unplugged, then link-based failure detection will report both bge0 and bge1000 as having instantly failed. However, if all of the probe targets on bge0 become unreachable, only bge0 will be reported as failed because bge1000 has its own probe targets on its own VLAN.