The in.mpathd daemon can detect interface failure and repair by two methods. In the first method, the daemon sends and receives ICMP echo probes through the interface. In the second method, the daemon monitors the RUNNING flag on the interface. The link state on some models of network interface cards is reflected by the RUNNING flag. Consequently, when the link fails, the failure is detected much sooner. An interface is considered to have failed if either of the previous two methods indicate failure. An interface is considered repaired only if both methods indicate that the interface is repaired.
The in.mpathd daemon sends ICMP echo probes to the targets that are connected to the link on all the interfaces. The interfaces must belong to a group to detect failures and repair. After you add an interface to a multipathing group, assign a test address. Then, the daemon sends probes to detect failures on all the interfaces of the multipathing group. How to Configure a Multipathing Interface Group With Two Interfaces describes the steps that you perform to configure test address and groups.
Because in.mpathd determines which targets to probe dynamically, you cannot configure the targets. Routers that are connected to the link are chosen as targets for probing. If no routers exist on the link, arbitrary hosts on the link are chosen. A multicast packet that is sent to the “all hosts” multicast address, 220.127.116.11 in IPv4 and ff02::1 in IPv6, determines the arbitrary hosts. The first few hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd cannot find routers or hosts that responded to ICMP echo packets, in.mpathd cannot detect failures.
To ensure that each NIC in the group functions properly, in.mpathd probes all the targets separately through all the interfaces in the multipathing group. If no replies are made to five consecutive probes, in.mpathd considers the interface to have failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. The in.mpathd(1M) man page describes how to change the failure detection time. For a failure detection time of 10 seconds, the probing rate is approximately one probe every two seconds. Failback occurs after a repair is detected. The actual time to detect an interface failure can take from 20 seconds to a few minutes. The time depends on the system and network load.
The failure detection time only applies to the ICMP echo probe method of detecting failures. If link failure clears the RUNNING flag for an interface, the in.mpathd daemon responds immediately to the change in the flag status.
After a failure is detected, failover of all network access occurs from the failed interface to another functional interface in the group. If you have configured a standby interface, in.mpathd chooses the standby interface for failover of IP addresses, broadcasts, and multicast memberships. If you have not configured a standby interface, in.mpathd chooses the interface with the least number of IP addresses.
Physical interfaces in the same group that are not present at system boot represent a special instance of failure detection. The startup script /etc/init.d/network detects these types of failure. Error messages that are similar to the following messages are displayed:
moving addresses from failed IPv4 interfaces: hme0 (moved to hme1) moving addresses from failed IPv6 interfaces: hme0 (moved to hme1)
In this special instance of failure detection, only static IP addresses are moved to a different physical interface. The addresses must be specified in the host name file. The physical interface must be in the same multipathing group.
This type of failure can be automatically repaired by a failback. The RCM DR Post-attach feature for IP network multipathing automates the DR attachment of a NIC. When a NIC is DR attached, the interface is plumbed and configured. If the interface was removed prior to a reboot, the IP multipathing Reboot-safe feature recovers the IP address. The IP address is transferred to the replaced NIC. The replaced NIC is added to the original IP multipathing interface group. See How to Recover a Physical Interface That Was Not Present at System Boot.