7 Configuring Coherence

This chapter describes how to configure IP addresses of Processing Servers and Signaling Servers to allow them to communicate with each other.

About Coherence

Using a technology called Oracle Coherence, Processing Servers and Signaling Servers distribute events across the domain boundaries.

To communicate with each other, servers need to be aware of each other's IP addresses. You can specify IP addresses of the servers using one of the following methods:

IP multicast. In this case, you need to specify a single IP address. All servers use this address to send and receive broadcast messages.

In addition to the multicast IP address, you can specify the maximum number of hops that an IP packet may traverse. This parameter is known as Time-To-Live (TTL).

Service Broker uses IP multicast as the only option when you run the domain creation script.

See "Setting Up IP Multicast" for more information.
IP unicast. In this case, you need to specify the IP address of each server in all domains.

See "Setting Up IP Unicast" for more information.

These two methods are mutually exclusive. You need to use one or the other in all your domains. The method that you should use depends on the configuration of your network, including topology, the number of servers, and possible restrictions imposed by your firewall or routers.

Setting Up IP Multicast

Because Service Broker uses IP multicast as the default option when you run the domain creation script, you need to specify IP multicast configuration parameters only when you want to modify the settings you defined during the domain creation process.

To set up IP multicast:

In the domain navigation pane, expand the OCSB node.
Expand the Domain Management node.
Select Coherence.
In the General tab, in the useWellKnownAddress list, select FALSE.
Click Apply.
Click the Multicast tab.
In the MultiCastAddress subtab, fill out the fields as follows:
- In the address field, enter the multicast IP address. You can enter any value in the range from 224.0.0.0 to 239.255.255.255.
- In the port field, enter the multicast port.
Click Apply.
Click the TTL subtab.
In the TTL field, specify the TTL. You can enter any value from 1 to 255.
Click Apply.
Stop all servers in all domains and then start them again. See "Starting and Stopping Processing and Signaling Servers" for more information.

Setting Up IP Unicast

To set up IP unicast:

In the domain navigation pane, expand the OCSB node.
Expand the Domain Management node.
Select Coherence.
In the General tab, in the useWellKnownAddress list, select TRUE.
Click Apply.
Click the Unicast tab.
In the ServerName subtab, click New.
In the New: /Unicast/ServerName window, in the ServerName field, enter a descriptive name of a Processing Server or Signaling Server. You use this name when specifying the IP address of the server.
Click Apply.
Click the ServerAddress subtab.
In the Parent list, select the server whose IP address you want to define. This list contains the servers that you previously defined in the ServerName subtab.
Click New.
In the New: /Unicast/ServerAddress window, fill out the fields as follows:
- In the address field, enter the IP address of the server.
- In the port address, enter the port of the server.
Click Apply.
Repeat this procedure for all Processing Servers and Signaling Servers.
Stop all servers in all domains and then start them again. See "Starting and Stopping Processing and Signaling Servers" for more information.

Configuring Cluster Node Death Detection Properties

Service Broker detects a cluster node death condition if there is a sustained node failure, network outage, or connection failure. If a node death is detected, it is considered lost from the cluster and a redundant server will automatically failover with no service loss.

The mechanism for node death detection is based on two Coherence properties specified in Table 7-1.

Coherence considers both the timeout and the retries count to decide if a node is dead. Using the default configuration, Coherence declares a node dead if two consecutive heartbeat requests time out. If the timeout property is set to the default five seconds, after ten seconds a node is considered disconnected from the main cluster.

You can change the property values in Table 7-1 because a ten-second duration might not be appropriate as the default time interval to trigger server failover for some deployments.

Table 7-1 Cluster Node Death Detection Properties

Property Name Coherence Option Service Broker default

Property Name	Coherence Option	Service Broker default
tangosol.coherence.ipmonitor.pingtimeout	tcp-ring-listener/ip-timeout `http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BABBGCHI`	5 (sec)
tangosol.coherence.ipmonitor.pingretries	tcp-ring-listener/ip-attempts `http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BABBGCHI`	2

tangosol.coherence.ipmonitor.pingtimeout

tcp-ring-listener/ip-timeout

http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BABBGCHI

5 (sec)

tangosol.coherence.ipmonitor.pingretries

tcp-ring-listener/ip-attempts

http://docs.oracle.com/cd/E15357_01/coh.360/e15723/appendix_operational.htm#BABBGCHI

Automatic Server Shutdown

The following scenario will cause a managed server to automatically shut down:

A network failure causes the Coherence cluster to split up. While the cluster is split each part acts as an independent cluster.
The network problem is resolved and some node(s) can rejoin the main cluster. If the state of the separated nodes become inconsistent with the main cluster they will not be able to join.

To avoid the need to monitor the system and detect when the above situation has occurred, these actions take place: When a coherence cluster service restart occurs, the managed servers of the main cluster survive but the other managed server JVMs automatically shut down.

If the node is automatically shut down, you must restart the managed server unless you have some kind of process supervision implemented.

Data Cache Restart

After network recovery, as managed servers start to rejoin the cluster, nodes rejoining the cluster as non-senior members will have their caches restarted (that data is lost).

The following scenario enables ongoing calls and the data cache to be maintained:

You have deployed a processing domain and a signalling domain on machine_1 and another processing domain and a signalling domain on machine_2. If a network connection problem occurs that causes the machines to be split into two clusters, they continue to handle traffic. Coherence keeps the backup copy of a cache entry on each machine. New sessions are handled successfully, and ongoing calls are processed successfully.

To return the cluster to a "normal state", one of the two sides must be restarted. If possible you should configure the associated load balancers to route new sessions to the side that will not be restarted. Then shut down that side shortly before restoring the network (if time of restoration is known).

When the network outage has been restored, start the managed servers that were shut down. During the network outage, the two sides can continue to run. If you know when the network will be restored, one of the sides should be shut down shortly before restoring the network.