Probe-Based Failure Detection

Language:

Probe-based failure detection consists of using ICMP probes to check whether an interface has failed. The implementation of this failure detection method depends on whether test addresses are used.

Probe-Based Failure Detection Using Test Addresses

This failure detection method involves sending and receiving ICMP probe messages that use test addresses. These messages, also called probe traffic or test traffic, are sent over the interface to one or more target systems on the same local network. The in.mpathd daemon probes all of the targets separately through all the interfaces that have been configured for probe-based failure detection. If no replies are made in response to five consecutive probes on a given interface, the in.mpathd daemon considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. However, you can tune the FDT in the IPMP configuration file. For instructions, see How to Configure the Behavior of the IPMP Daemon.

To optimize probe-based failure detection, you must set multiple target systems to receive the probes from the in.mpathd daemon. By having multiple target systems, you can better determine the nature of a reported failure. For example, the absence of a response from the only defined target system can indicate a failure either in the target system or in one of the IPMP group's interfaces. By contrast, if only one system among several target systems does not respond to a probe, then the failure is likely in the target system rather than in the IPMP group itself.

The in.mpathd daemon determines which target systems to probe dynamically. First, the daemon searches the routing table for target systems on the same subnet as the test addresses that are associated with the IPMP group's interfaces. If such targets are found, then the daemon uses them as targets for probing. If no target systems are found on the same subnet, then the daemon sends multicast packets to probe neighbor hosts on the link. The multicast packet is sent to the All Hosts multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, to determine which hosts to use as target systems. The first five hosts that respond to the echo packets are chosen as targets for probing. If the daemon cannot find routers or hosts that responded to the multicast probes, then the daemon cannot detect probe-based failures. In this case, the ipmpstat –i command reports the probe state as unknown.

You can use host routes to explicitly configure a list of target systems to be used by the in.mpathd daemon. For instructions, see Configuring Probe-Based Failure Detection.

Probe-Based Failure Detection Without Using Test Addresses

With no test addresses, this method is implemented by using two types of probes:

ICMP probes

ICMP probes are sent by the active interfaces in the IPMP group to probe targets that are defined in the routing table. An active interface is an underlying interface that can receive inbound IP packets that are addressed to the interface's link layer (L2) address. The ICMP probe uses the data address as the probe's source address. If the ICMP probe reaches its target and gets a response from the target, then the active interface is operational.
Transitive probes

Transitive probes are sent by the alternate interfaces in the IPMP group to probe the active interface. An alternate interface is an underlying interface that does not actively receive any inbound IP packets.

For example, consider an IPMP group that consists of four underlying interfaces and one data address. In this configuration, outbound packets can use all of the underlying interfaces. However, inbound packets can only be received by the interface to which the data address is bound. The remaining three underlying interfaces that cannot receive inbound packets are the alternate interfaces.

If an alternate interface can successfully send a probe to an active interface and receive a response, then the active interface is functional, and by inference, so is the alternate interface that sent the probe.

Note - In Oracle Solaris, probe-based failure detection operates with test addresses. To select probe-based failure detection without test addresses, you must manually enable transitive probing. For instructions, see Selecting a Failure Detection Method.

Group Failure

A group failure occurs when all of the interfaces in an IPMP group appear to fail at the same time. In this case, no underlying interface is usable. Also, when all of the target systems fail at the same time and probe-based failure detection is enabled, the in.mpathd daemon flushes all of its current target systems and probes for new target systems.

In an IPMP group that has no test addresses, a single interface that can probe the active interface is designated as a prober. This designated interface has both the FAILED flag and PROBER flag set. The data address is bound to this interface, which enables the interface to continue probing the target to detect recovery.