Types of Failure Detection in IPMP (System Administration Guide: Network Interfaces and Network Virtualization)

System Administration Guide: Network Interfaces and Network Virtualization

Types of Failure Detection in IPMP

The in.mpathd daemon handles the following types of failure detection:

Link-based failure detection, if supported by the NIC driver
Probe-based failure detection, when test addresses are configured
Detection of interfaces that were missing at boot time

Link-Based Failure Detection

Link-based failure detection is always enabled, provided that the interface supports this type of failure detection.

To determine whether a third-party interface supports link-based failure detection, use the ipmpstat -i command. If the output for a given interface includes an unknown status for its LINK column, then that interface does not support link-based failure detection. Refer to the manufacturer's documentation for more specific information about the device.

These network drivers that support link-based failure detection monitor the interface's link state and notify the networking subsystem when that link state changes. When notified of a change, the networking subsystem either sets or clears the RUNNING flag for that interface, as appropriate. If the in.mpathd daemon detects that the interface's RUNNING flag has been cleared, the daemon immediately fails the interface.

Probe-Based Failure Detection

The multipathing daemon performs probe-based failure detection on each interface in the IPMP group that has a test address. Probe-based failure detection involves sending and receiving ICMP probe messages that use test addresses. These messages, also called probe traffic or test traffic, go out over the interface to one or more target systems on the same local network. The daemon probes all the targets separately through all the interfaces that have been configured for probe-based failure detection. If no replies are made in response to five consecutive probes on a given interface, in.mpathd considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. However, you can tune the failure detection time in the IPMP configuration file. For instructions, go to How to Configure the Behavior of the IPMP Daemon. To optimize probe-based failure detection, you must set multiple target systems to receive the probes from the multipathing daemon. By having multiple target systems, you can better determine the nature of a reported failure. For example, the absence of a response from the only defined target system can indicate a failure either in the target system or in one of the IPMP group's interfaces. By contrast, if only one system among several target systems does not respond to a probe, then the failure is likely in the target system rather than in the IPMP group itself.

Repair detection time is twice the failure detection time. The default time for failure detection is 10 seconds. Accordingly, the default time for repair detection is 20 seconds. After a failed interface has been repaired and the interface's RUNNING flag is once more detected, in.mpathd clears the interface's FAILED flag. The repaired interface is redeployed depending on the number of active interfaces that the administrator has originally set.

The in.mpathd daemon determines which target systems to probe dynamically. First the daemon searches the routing table for target systems that are on the same subnet as the test addresses that are associated with the IPMP group's interfaces. If such targets are found, then the daemon uses them as targets for probing. If no target systems are found on the same subnet, then in.mpathd sends multicast packets to probe neighbor hosts on the link. The multicast packet is sent to the all hosts multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, to determine which hosts to use as target systems. The first five hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd cannot find routers or hosts that responded to the multicast probes, then ICMP echo packets, in.mpathd cannot detect probe-based failures. In this case, the ipmpstat -i utility will report the probe state as unknown.

You can use host routes to explicitly configure a list of target systems to be used by in.mpathd. For instructions, refer to Configuring for Probe-Based Failure Detection.

NICs That Are Missing at Boot

NICs that are not present at system boot represent a special instance of failure detection. At boot time, the startup scripts track any interfaces with /etc/hostname.interface files. Any data addresses in such an interface's /etc/hostname.interface file are automatically configured on the corresponding IPMP interface for the group. However, if the interfaces themselves cannot be plumbed because they are missing, then error messages similar to the following are displayed:

moving addresses from missing IPv4 interfaces: hme0 (moved to ipmp0)
moving addresses from missing IPv6 interfaces: hme0 (moved to ipmp0)

Note –

In this instance of failure detection, only data addresses that are explicitly specified in the missing interface's /etc/hostname.interface file are moved to the IPMP interface.

If an interface with the same name as another interface that was missing at system boot is reattached using DR, the Reconfiguration Coordination Manager (RCM) automatically plumbs the interface. Then, RCM configures the interface according to the contents of the interface's /etc/hostname.interface file. However, data addresses, which are addresses without the NOFAILOVER flag, that are in the /etc/hostname.interface file are ignored. This mechanism adheres to the rule that data addresses should be in the /etc/hostname.ipmp-interface file, and only test addresses should be in the underlying interface's /etc/hostname.interface file. Issuing the ifconfig group command causes that interface to again become part of the group. Thus, the final network configuration is identical to the configuration that would have been made if the system had been booted with the interface present.

For more information about missing interfaces, see About Missing Interfaces at System Boot.

Failure Detection and the Anonymous Group Feature

IPMP supports failure detection in an anonymous group. By default, IPMP monitors the status only of interfaces that belong to IPMP groups. However, the IPMP daemon can be configured to also track the status of interfaces that do not belong to any IPMP group. Thus, these interfaces are considered to be part of an “anonymous group.” When you issue the command ipmpstat -g, the anonymous group will be displayed as double-dashes (--). In anonymous groups, the interfaces would have their data addresses function also as test addresses. Because these interfaces do not belong to a named IPMP group, then these addresses are visible to applications. To enable tracking of interfaces that are not part of an IPMP group, see How to Configure the Behavior of the IPMP Daemon.