Oracle® Communications WebRTC Session Controller System Administrator's Guide Release 7.0 E40973-01 |
|
|
PDF · Mobi · ePub |
This chapter describes how to use the Oracle Communications WebRTC Session Controller "echo server" process to improve SIP data tier failover performance when a server becomes physically disconnected from the network.
In a production system, engine tier servers continually access SIP data tier replicas to retrieve and write call state data. The WebRTC Session Controller architecture depends on engine tier nodes to detect when a SIP data tier server has failed or become disconnected. When an engine cannot access or write call state data because a replica is unavailable, the engine connects to another replica in the same partition and reports the offline server. The replica updates the current view of the SIP data tier to account for the offline server, and other engines are then notified of the updated view as they access and retrieve call state data.
By default, an engine tier server uses its Remote Method Invocation (RMI) connection to the replica to determine if the replica has failed or become disconnected. The algorithms used to determine a failure of an RMI connection are reliable, but ultimately they depend on the TCP protocol's retransmission timers to diagnose a disconnection (for example, if the network cable to the replica is removed). Because the TCP retransmission timer generally lasts a full minute or longer, WebRTC Session Controller provides an alternate method of detecting failures that can diagnose a disconnected replica in a matter of a few seconds.
WlssEchoServer is a separate process that you can run on the same server hardware as a SIP data tier replica. WlssEchoServer provides a simple UDP echo service to engine tier nodes used for determining when a SIP data tier server goes offline. The algorithm for detecting failures with WlssEchoServer is as follows:
For all normal traffic, engine tier servers communicate with SIP data tier replicas using TCP. TCP is used as the basic transport between the engine tier and SIP data tier regardless of whether WlssEchoServer is used.
Engine tier servers send a periodic heartbeat message to each configured WlssEchoServer over UDP. During normal operation, WlssEchoServer responds to the heartbeats so that the connection between the engine node and replica is verified.
Should there be a complete failure of the SIP data tier stack, or the network cable is disconnected, the heartbeat messages are not returned to the engine node. In this case, the engine node can mark the replica as being offline without having to wait for the normal TCP connection timeout.
After identifying the offline server, the engine node reports the failure to an available SIP data tier replica, and the SIP data tier view is updated as described in the previous section.
Also, should a SIP data tier server notice that its local WlssEchoServer process has died, it automatically shuts down. This behavior ensures even quicker failover because avoids the time it takes engine nodes to notice and report the failure as described in "Overview of Failover Detection".
You can configure the heartbeat mechanism on engine tier servers to increase the performance of failover detection as necessary. You can also configure the listen port and log file that WlssEchoServer uses on SIP data tier servers.
If any engine tier server cannot communicate with a particular replica, the engine access another, available replica in the SIP data tier to report the offline server. The replica updates its view of the affected partition to remove the offline server. The updated view is then distributed to all engine tier servers that later access the partition. Propagating the view in this manner helps to ensure that engine servers do not attempt to access the offline replica.
The replica that updates the view also issues a one-time request to the offline replica to ask it to shut down. This is done to try to shut-down running replica servers that cannot be accessed by one or more engine servers due to a network outage. If an active replica can reach the replica marked as "offline," the offline replica shuts down.
Note:
Using WlssEchoServer is not required in all WebRTC Session Controller installations. Enable the echo server only when your system requires detection of a network or replica failure faster than the configured TCP timeout interval.Observe the following requirements and restrictions when using WlssEchoServer to detect replica failures:
If you use the heartbeat mechanism to detect failures, you must ensure that the WlssEchoServer process is always running on each replica server. If the WlssEchoServer process fails or is stopped, the replica will be treated as being "offline" even if the server process is unaffected.
The WlssEchoServer listens on all IP addresses available on the server.
WlssEchoServer requires a dedicated port number to listen for heartbeat messages.
WlssEchoServer is a Java program that you can start directly from a shell or command prompt. The basic syntax for starting WlssEchoServer is:
java -classpath WLSS_HOME/server/lib/wlssechosvr.jar options com.bea.wcp.util.WlssEchoServer
Where WLSS_HOME is the path to the WebLogic Server SIP directory and options may include one of the options described in Table 10-1.
Table 10-1 WlssEchoServer Options
Option | Description |
---|---|
-Dwlss.ha.echoserver.ipaddress |
Specifies the IP address on which the WlssEchoServer instance listens for heartbeat messages. If you do not specify an IP address, the instance listens on any available IP address (0.0.0.0). |
-Dwlss.ha.echoserver.port |
Specifies the port number used to listen for heartbeat messages. Ensure that the port number you specify is not used by any other process on the server. By default WlssEchoServer uses port 6734. |
-Dwlss.ha.echoserver.logfile |
Specifies the log file location and name. By default, WebLogic writes log messages to ./echo_servertime.log where time is the time expressed in milliseconds. |
Oracle recommends that you include the command to start WlssEchoServer in the same script you use to start each WebRTC Session Controller SIP data tier instance. If you use the startManagedWebLogic.sh script to start an engine or SIP data tier server instance, add a command to start WlssEchoServer before the final command used to start the server. For example, change the lines:
"$JAVA_HOME/bin/java" ${JAVA_VM} ${MEM_ARGS} ${JAVA_OPTIONS} \ -Dweblogic.Name=${SERVER_NAME} \ -Dweblogic.management.username=${WLS_USER} \ -Dweblogic.management.password=${WLS_PW} \ -Dweblogic.management.server=${ADMIN_URL} \ -Djava.security.policy="${WL_HOME}/server/lib/weblogic.policy" \ weblogic.Server
to read:
"$JAVA_HOME/bin/java" -classpath WLSS_HOME/server/lib/wlssechosvr.jar \ -Dwlss.ha.echoserver.ipaddress=192.168.1.4 \ -Dwlss.ha.echoserver.port=6734 com.bea.wcp.util.WlssEchoServer & "$JAVA_HOME/bin/java" ${JAVA_VM} ${MEM_ARGS} ${JAVA_OPTIONS} \ -Dweblogic.Name=${SERVER_NAME} \ -Dweblogic.management.username=${WLS_USER} \ -Dweblogic.management.password=${WLS_PW} \ -Dweblogic.management.server=${ADMIN_URL} \ -Djava.security.policy="${WL_HOME}/server/lib/weblogic.policy" \ weblogic.Server
To enable the WlssEchoServer heartbeat mechanism, you must include the -Dreplica.host.monitor.enabled JVM argument in the command you use to start all engine and SIP data tier servers. Oracle recommends adding this option directly to the script used to start Managed Servers in your system. For example, in the startManagedWebLogic.sh script, change the line:
# JAVA_OPTIONS="-Dweblogic.attribute=value -Djava.attribute=value"
to read:
JAVA_OPTIONS="-Dreplica.host.monitor.enabled=true"
Several additional JVM options configure the functioning of the heartbeat mechanism. Table 10-2 describes the options used to configure failure detection.
Table 10-2 WlssEchoServer Options
Option | Description |
---|---|
-Dreplica.host.monitor.enabled |
This system property is required on both engine and SIP data tier servers to enable the heartbeat mechanism. |
-Dwlss.ha.heartbeat.interval |
Specifies the number of milliseconds between heartbeat messages. By default heartbeats are sent every 1,000 milliseconds. |
-Dwlss.ha.heartbeat.count |
Specifies the number of consecutive, missed heartbeats that are permitted before a replica is determined to be offline. By default, a replica is marked offline if the |
-Dwlss.ha.heartbeat.SoTimeout |
Specifies the UDP socket timeout value. |