Failure Detection in DLMP Aggregation

Language:

Failure detection in DLMP aggregation is a method to detect the failure of the aggregated ports. A port is considered to have failed when it cannot send or receive traffic. The port might fail because of the following reasons:

Damage or cut in the cable
Switch port goes down
Failure in upstream network path

DLMP aggregation performs failure detection on the aggregated ports to ensure continuous availability of the network to send or receive traffic. When a port fails, the clients associated with that port are failed over to an active port. Failed aggregated ports remain unusable until they are repaired. The remaining active ports continue to function while any existing ports are deployed as needed. After the failed port recovers from the failure, clients from other active ports can be associated with it.

DLMP aggregation supports both link-based and probe-based failure detection.

Link-Based Failure Detection

Link-based failure detection detects failure when the cable is cut or when the switch port is down. It therefore can only detect failures caused by the loss of direct connection between the datalink and the first-hop switch. Link-based failure detection is enabled by default when a DLMP aggregation is created.

Probe-Based Failure Detection

Probe-based failure detection detects failures between an end host and the configured targets. This feature overcomes the known limitations of link-based failure detection. Probe-based failure detection is useful when a default router is down or when the network becomes unreachable. The DLMP aggregation detects failure by sending and receiving probe packets.

To enable probe-based failure detection in DLMP aggregation, you must configure the probe-ip property.

Note - In a DLMP aggregation, if no probe-ip is configured, then probe-based failure detection is disabled and only link-based failure detection is used.

When you create the first DLMP aggregation, the service svc:/network/dlmp:default is automatically enabled. This service starts the in.dlmpd daemon, which performs probe-based failure detection in DLMP aggregations. This service is disabled when no DLMP aggregation is in the system. For information, see Configuring Probe-Based Failure Detection for DLMP Aggregation.

Probe-based failure detection is performed by using the combination of two types of probes: Internet Control Message Protocol (ICMP (L3)) probes and transitive (L2) probes, which work together to determine the health of the aggregated physical datalinks.

ICMP Probing

You can configure a comma-separated list of source IP addresses and an optional target IP address or host name. The target IP address must be on the same subnet as the specified source IP address. You can specify the source IP address in four different forms. For more information, see How to Configure Probe-Based Failure Detection for DLMP.

ICMP probing uses the configured source IP addresses of the probe-ip property only if the IP addresses are associated with clients such as VNICs. A port is associated with a client such as a VNIC only when the port receives inbound traffic and transmits outbound traffic for that client. At any particular time, the inbound or the outbound traffic for a client always goes through only one underlying port of the DLMP aggregation. The configured IP addresses of the probe-ip property are used to monitor the health of the ports only if the IP addresses are associated with that port.

For each configured source IP address, the in.dlmpd daemon periodically sends out unicast ICMP packets directed at the configured targets. If target IP addresses are not configured, in.dlmpd uses the routing table for routes on the same subnet as the specified source IP address, and uses the specified next-hop as the target IP address.

ICMP probe traffic is sent out only through the port associated with that IP client. The port is marked as ICMP failed if all the targets for that particular port become unreachable. The port is marked as ICMP active if at least one of the targets is reachable from that port through an ICMP probe.
Transitive Probing

Transitive probing is performed when the state of the health for all the network ports cannot be determined by ICMP probing. Hence, transitive probing is performed when any port is not associated with the source IP address that is configured for the probe-ip property. For example, transitive probing is performed when any port is not associated with an IP client or when the number of configured IP addresses of the probe-ip property is less than the total number of aggregated ports. Probe packets are sent periodically from the ports that are not associated with any IP client to the peer ports. If a port is able to reach any ICMP active port, then that port is considered L2 active.

Oracle Solaris includes proprietary protocol packets for transitive probes that are transmitted over the network. For more information, see Appendix B, Packet Format of Transitive Probes.

Probe-based failure detection is performed in the global zone when VNICs over an aggregation are created in the global zone and are assigned to non-global zones. However, probe traffic can be segregated from the non-global zone with the help of VLANs. For example, when the probe traffic runs on one VLAN in a global-zone, the non-global zone traffic can run on a different VLAN.