Deployment Considerations - Cisco Switches

When deploying Coherence with Cisco switches please be aware of the following:

Buffer Space and Packet Pauses

Under heavy UDP packet load some Cisco switches may run out of buffer space and exhibit frequent multi-second communication pauses. These communication pauses can be identified by a series of Coherence log messages referencing communication delays with multiple nodes which cannot be attributed to local or remote GCs.

Experienced a 4172 ms communication delay (probable remote GC) with Member(Id=7, Timestamp=2006-10-20 12:15:47.511, Address=192.168.0.10:8089, MachineId=13838); 320 packets rescheduled, PauseRate=0.31, Threshold=512

The Cisco 6500 series support configuration the amount of buffer space available to each ethernet port or ASIC. In high load applications it may be necessary to increase the default buffer space. This can be accomplished by executing:

fabric buffer-reserve high

See Cisco's documentation for additional details on this setting.

Multicast Connectivity on Large Networks

Cisco's default switch configuration does not support proper routing of multicast packets between switches due to the use of IGMP snooping. See the Cisco's documentation regarding the issue and solutions.

Multicast Outages

Some Cisco switches have shown difficulty in maintaining multicast group membership resulting in existing multicast group members being silently removed from the multicast group. This will cause a partial communication disconnect for the associated Coherence node(s) and they will be forced to leave and rejoin the cluster. This type of outage can most often be identified by the following Coherence log messages indicating that a partial communication problem has been detected.

A potential network configuration problem has been detected. A packet has failed to be delivered (or acknowledged) after 60 seconds, although other packets were acknowledged by the same cluster member (Member(Id=3, Timestamp=Sat Dec 01 12:02:54 EST 2006, Address=192.168.1.100, Port=8088, MachineId=48991)) to this member (Member(Id=1, Timestamp=Sat Dec 01 11:51:11 EST 2006, Address=112.168.1.101, Port=8088, MachineId=49002)) as recently as 5 seconds ago.

To confirm the issue you may run the Multicast Test using the same multicast address and port as the running cluster. If the issue affects a multicast test node its logs will show that at some point it will suddenly stop receiving multicast test messages. The following test logs show the issue:

Test Node 192.168.1.100:

Mon Nov 13 16:44:22 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:23 GMT 2006: Received test packet 76 from ip=/192.168.1.101, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:23 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:23 GMT 2006: Sent packet 85.
Mon Nov 13 16:44:23 GMT 2006: Received test packet 85 from self.
Mon Nov 13 16:44:24 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:25 GMT 2006: Received test packet 77 from ip=/192.168.1.101, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:25 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:25 GMT 2006: Sent packet 86.
Mon Nov 13 16:44:25 GMT 2006: Received test packet 86 from self.
Mon Nov 13 16:44:26 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:27 GMT 2006: Received test packet 78 from ip=/192.168.1.101, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:27 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:27 GMT 2006: Sent packet 87.
Mon Nov 13 16:44:27 GMT 2006: Received test packet 87 from self.
Mon Nov 13 16:44:28 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:29 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:29 GMT 2006: Sent packet 88.
Mon Nov 13 16:44:29 GMT 2006: Received test packet 88 from self.
Mon Nov 13 16:44:30 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:31 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:31 GMT 2006: Sent packet 89.
Mon Nov 13 16:44:31 GMT 2006: Received test packet 89 from self.
Mon Nov 13 16:44:32 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???
Mon Nov 13 16:44:33 GMT 2006: Received 83 bytes from a Coherence cluster node at 182.168.1.100: ???

Test Node 192.168.1.101:

Mon Nov 13 16:44:22 GMT 2006: Sent packet 76.
Mon Nov 13 16:44:22 GMT 2006: Received test packet 76 from self.
Mon Nov 13 16:44:22 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:22 GMT 2006: Received test packet 85 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:23 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:24 GMT 2006: Sent packet 77.
Mon Nov 13 16:44:24 GMT 2006: Received test packet 77 from self.
Mon Nov 13 16:44:24 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:24 GMT 2006: Received test packet 86 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:25 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:26 GMT 2006: Sent packet 78.
Mon Nov 13 16:44:26 GMT 2006: Received test packet 78 from self.
Mon Nov 13 16:44:26 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:26 GMT 2006: Received test packet 87 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:27 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:28 GMT 2006: Sent packet 79.
Mon Nov 13 16:44:28 GMT 2006: Received test packet 79 from self.
Mon Nov 13 16:44:28 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:28 GMT 2006: Received test packet 88 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:29 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:30 GMT 2006: Sent packet 80.
Mon Nov 13 16:44:30 GMT 2006: Received test packet 80 from self.
Mon Nov 13 16:44:30 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:30 GMT 2006: Received test packet 89 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:31 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:32 GMT 2006: Sent packet 81.
Mon Nov 13 16:44:32 GMT 2006: Received test packet 81 from self.
Mon Nov 13 16:44:32 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:32 GMT 2006: Received test packet 90 from ip=/192.168.1.100, group=/224.3.2.0:32367, ttl=4.
Mon Nov 13 16:44:33 GMT 2006: Received 83 bytes from a Coherence cluster node at 192.168.1.100: ???
Mon Nov 13 16:44:34 GMT 2006: Sent packet 82.

Note that at 16:44:27 the first test node stops receiving multicast packets from other machines. The OS continues to properly forward multicast traffic from other processes on the same machine, but the test packets (79 and higher) from the second test node are not received. Also note that both the test packets and the cluster's multicast traffic generated by the first node do continue to be delivered to the second node. This indicates that the first node was silently removed from the multicast group.

If you encounter this multicast issue it is suggested that you contact Cisco technical support, or you may consider changing your configuration to unicast-only via the Coherence Well Known Addresses feature.