This chapter describes how to configure Oracle Communications WebRTC Session Controller to improve failover performance when a server becomes physically disconnected from the network.
To achieve a highly-available production system, the WebRTC Session Controller uses the Oracle Coherence distributed cache service to retrieve and write call-state data. The cache service consists of a number of partitions that are spread across the servers that are running in the cluster. Each partition has a primary copy of call-state storage assigned to one server in the cluster, and a backup copy assigned to another server in the cluster. This means that a call state that is required to process a request may reside on a remote server and possibly even a remote machine.
The WebRTC Session Controller architecture depends on the Coherence cache service to detect when a server has failed or becomes disconnected. When an engine cannot access or write call-state data because a server is unavailable, the Coherence cache service detects this and reassigns the lost server's partitions to another server in the cluster and ensures a new backup copy is made available on a different server, if one is running.
The Coherence cache service uses its own cluster communication protocol, known as Tangosol Cluster Management Protocol (TCMP), to invoke remote servers, detect server failure and achieve high availability. This protocol uses an optimized algorithm to quickly detect that a server has become physically disconnected from the network. This algorithm, and the configuration options that are available to modify its behavior, are described in detail in the Oracle Coherence documentation. See the following documentation for more information on Coherence and its distributed cache service.
"Introduction to Coherence Clusters" in Developing Applications with Oracle Coherence
"Understanding Distributed Caches" in Developing Applications with Oracle Coherence
See "Configuring Coherence" and "SIP Coherence Configuration Reference (coherence.xml)" for additional information on configuring Coherence for the WebRTC Session Controller.
The WebRTC Session Controller relies to a large extent on Oracle Coherence to detect and handle a split-brain condition. A split-brain condition can occur, for example, when connectivity is restored between two or more parts of a cluster that had been isolated from each other. When the WebRTC Session Controller detects such a condition, it attempts to recover by shutting down part of the cluster and expecting the affected servers to restart and join the surviving cluster as new members.
When Coherence detects a split-brain condition, its behavior is controlled primarily through the options related to death detection in the cluster-related configuration.
You can use the following three mechanisms to modify Coherence configuration options:
The default Coherence cluster configuration file
The system properties
The tangosol-coherence-override.xml file
WARNING:
No servers in the domain can be running when you make changes to the Coherence configuration. Also, the configuration must be the same for all servers in the domain or unexpected behavior can result.
The default Coherence cluster configuration file, Custom-Default.xml, resides in the following location:
$DOMAIN_HOME/config/coherence/Coherence-Default/
where $DOMAIN_HOME is the root directory for the domain.
Table 9-1 describes the default configuration options that you can specify.
Table 9-1 Coherence Cluster Configuration File Options
Option | Element Name | System Property Name | Default Value |
---|---|---|---|
TCP-ring IP-timeout |
<tcp-ring-listener><pingtimeout> |
tangosol.coherence.ipmonitor.pingtimeout |
5 |
TCP-ring IP-attempts |
<tcp-ring-listener><pingattempts |
tangosol.coherence.ipmonitor.pingtattempts |
2 |
Service Guardian Timeout |
<service-guardian><timeout-milliseconds> |
tangosol.coherence.guard.timeout |
305000 |
Packet Delivery Timeout |
<packet-delivery><timeout-milliseconds> |
tangosol.coherence.packet.timeout |
300000 |
You can override these default configuration options either by modifying the corresponding system properties or creating an override configuration file, called tangosol-coherence-override.xml, which you add to the system CLASSPATH variable on all servers.
See the following Coherence documentation for information on which configuration options you can override and for information on how to use the override configuration option:
"Configuring a Coherence Cluster" in Administering Clusters for Oracle WebLogic Server
"Death Detection Recommendations" in Administering Oracle Coherence
"Configuring Death Detection" in Developing Applications with Oracle Coherence
"Understanding the XML Overrride Feature" in Developing Applications with Oracle Coherence
"Coherence Operational Configuration Reference" in Developing Applications with Oracle Coherence