Sun GlassFish Communications Server 2.0 High Availability Administration Guide

How SIP Session Replication Works

The SIP Session Replication scheme enables Communications Server server to provide high availability and failover.

Consider a cluster with four instances. Each instance has an active cache of SIP Dialog Structures and a replica cache of SIP Dialog Structures. Each Dialog Structure is replicated based on its BEKey. Dialog Structure replicas are created based on the Converged Load Balancer's routing algorithm.


Note –

A SIP Dialog Structure is a peer-to-peer SIP relationship between two user agents that persists for a specific time.


SIP Session Replication Scheme

In this scheme, consider a scenario in which instance4 fails. The converged load balancer routes the request for DialogStructure1 to instance3, and the request for DialogStructure2 gets routed to instance4.

When the request for DialogStructure1 arrives in instance3, the replica cache of instance3 will already have DialogStructure1 in its replica cache. Therefore, DialogStructure1 is moved to the active cache of instance3 without requiring a network load for DialogStructure1. After DialogStructure1 is loaded to instance3's active cache, DialogStructure1 is replicated to its new partner as determined by the CLB algorithm (say instance2, as shown in the following figure.

SIP Session Replication When an Instance Fails

The same replication approach is used for DialogStructure2 as well. instance2 is shown as the new replica partner for DialogStructure2 , but it could be instance3 or any other instance as computed by the converged load balancer algorithm.

When the failed node, instance1, gets restarted, the requests for DialogStructure1 and DialogStructure2 get remapped to instance1, as shown in the following figure:

SIP Session Replication When an Instance Restarts

When the instance1 gets the request for DialogStructure1, it moves the DialogStructure1 from instance3's active cache to its active cache. After DialogStructure1 is moved, instance1 replicates DialogStructure1to its replica partner as determined by the converged load balancer routing algorithm (which is instance3, as shown in the figure). Similarly, when the request for DialogStructure2 remaps to instance1, DialogStructure2 is moved from instance4 to instance1. DialogStructure2 is then replicate in instance4.


Note –

After instance1 is restarted, the converged load balancer does not route the full traffic immediately. Instead, it routes the incremental traffic by using load-factor settings. After a specific interval (as determined by the load-factor setting), the restarted instance starts receiving full traffic.