The following sections describe how to replicate call state transactions across multiple, regional WebLogic SIP Server installations (“sites”):
The basic call state replication functionality available in the WebLogic SIP Server data tier provides excellent failover capabilities for a single site installation. However, the active replication performed within the data tier requires high network bandwidth in order to meet the latency performance needs of most production networks. This bandwidth requirement makes a single data tier cluster unsuitable for replicating data over large distances, such as from one regional data center to another.
WebLogic SIP Server’s geographic persistence feature enables you to replica call state transactions across multiple WebLogic SIP Server installations (multiple Administrative domains or “sites”). A geographically-redundant configuration minimizes dropped calls in the event of a catastrophic failure of an entire site, for example due to an extended, regional power outage.
When using geographic persistence, a single replica in the primary site places modified call state data on a distributed JMS queue. By default, data is placed on the queue only at SIP dialog boundaries. (A custom API is provided for application developers that want to replicate data using a finer granularity, as described in Using Persistence Hints in SIP Applications.) In a secondary site, engine tier servers use a message listener to monitor the distributed queue to receive messages and write the data to its own data tier cluster. If the secondary site uses an RDBMS to store long-lived call states (recommended), then all data writes from the distribute queue go directly to the RDBMS, rather than to the in-memory storage of the data tier.
A secondary WebLogic SIP Server domain that persists data from another domain may itself process SIP traffic, or it may exist solely as an active standby domain. In the most common configuration, two sites are configured to replicate each other’s call state data, with each site processing its own local SIP traffic. The administrator can then use either domain as the “secondary” site should one of domains fail.
An alternate configuration utilizes a single domain that persists data from multiple, other sites, acting as the secondary for those sites. Although the secondary site in this configuration can also process its own, local SIP traffic, keep in mind that the resource requirements of the site may be considerable because of the need to persist active traffic from several other installations.
WebLogic SIP Server’s geographically-redundant persistence feature is most useful for sites that manage long-lived call state data in an RDBMS. Short-lived calls may be lost in the transition to a secondary site, because WebLogic SIP Server may choose to collect data for multiple call states before replicating between sites.
You must have a reliable, site-aware load balancing solution that can partition calls between geographic locations, as well as monitor the health of a given regional site. WebLogic SIP Server provides no automated functionality for detecting the failure of an entire domain, or for failing over to a secondary site. It is the responsibility of the Administrator to determine when a given site has “failed,” and to redirect that site’s calls to the correct secondary site. Furthermore, the site-aware load balancer must direct all messages for a given callId to a single home site (the “active” site). If, after a failover, the failed site is restored, the load balancer must continue directing calls to the active site and not partition calls between the two sites.
During a failover to a secondary site, some calls may be dropped. This can occur because WebLogic SIP Server generally queues call state data for site replication only at SIP dialog boundaries. Failures that occur before the data is written to the queue result in the loss of the queued data.
Also, WebLogic SIP Server replicates call state data across sites only when a SIP dialog boundary changes the call state. If a long-running call exists on the primary site before the secondary site is started, and the call state remains unmodified, that call’s data is not replicated to the secondary site. Should a failure occur before a long-running call state has been replicated, the call is lost during failover.
When planning for the capacity of a WebLogic SIP Server installation, keep in mind that, after a failover, a given site must be able to support all of the calls from the failed site as well as from its own geographic location. Essentially this means that all sites that are involved in a geographically-redundant configuration will operate at less than maximum capacity until a failover occurs.
In order to use the WebLogic SIP Server geographic persistence features, you must perform certain configuration tasks on both the primary “home” site and on the secondary replication site. Table 5-1
Note: | In most production deployments, two sites will perform replication services for each other, so you will generally configure each installation as both a primary and secondary site. |
WebLogic SIP Server provides domain templates to automate the configuration of most of the resources described in Table 5-1. See Using the Configuration Wizard Templates for Geographic Persistence for information about using the templates.
If you have an existing WebLogic SIP Server domain and want to use geographic persistence, follow the instructions in Configuring Geographical Redundancy by Hand to create the resources.
WebLogic SIP Server provides two Configuration Wizard templates for using geographic persistence features:
WLSS_HOME/common/templates/domains/geo1domain.jar
configures a primary site having a site ID of 1. The domain replicates data to the engine tier servers created in geo2domain.jar
.WLSS_HOME/common/templates/domains/geo2domain.jar
configures a secondary site that replicates call state data from the domain created with geo1domain.jar
. This installation has site ID of 2.The server port numbers in both domain templates are unique, so you can test geographic persistence features on a single machine if necessary. Follow the instructions in the sections that follow to install and configure each domain.
Follow these steps to create a new primary domain from the template:
cd ~/bea/sipserver30 /common/bin
./config.sh
geo1domain.jar
, and click OK.The template creates a new domain with two engine tier servers in a cluster, two data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:
wlss.callstate.datasource
, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Modify the JDBC Datasource Connection Information.Follow these steps to use a template to create a secondary site from replicating call state data from the “geo1” domain:
cd ~/bea/sipserver30 /common/bin
./config.sh
geo2domain.jar
, and click OK.The template creates a new domain with two engine tier servers in a cluster, two data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:
wlss.callstate.datasource
, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Modify the JDBC Datasource Connection Information.SystemModule-Callstate
, that includes:JMSServer-1
and JMSServer-2
, are deployed to engine1-site2
and engine2-site2
, respectively.
If you have an existing replicated WebLogic SIP Server installation, or pair of installations, you must create by hand the JMS and JDBC resources required for enabling geographical redundancy. You must also configure each site to perform replication. These basic steps for enabling geographical redundancy are:
The sections that follow describe each step in detail.
Follow the instructions in Storing Long-Lived Call State Data in an RDBMS to configure the JDBC resources required for storing long-lived call states in an RDBMS.
Both the primary and secondary sites must configure the correct persistence settings in order to enable replication for geographical redundancy. Follow these steps to configure persistence:
Any site that replicates call state data from another site must configure certain required JMS resources. The resources are not required for sites that do not replicate data from another site.
Follow these steps to configure JMS resources:
This section provides more detail into how multiple sites replicate call state data. Administrators can use this information to better understand the mechanics of geo-redundant replication and to better troubleshoot any problems that may occur in such a configuration. Note, however, that the internal workings of replication across WebLogic SIP Server installations is subject to change in future releases of the product.
When a call is initiated on a primary WebLogic SIP Server site, call setup and processing occurs normally. When a SIP dialog boundary is reached, the call is replicated (in-memory) to the site’s data tier, and becomes eligible for replication to a secondary site. WebLogic SIP Server may choose to aggregate multiple call states for replication in order to optimize network usage.
A single replica in the data tier then places the call state data to be replicated on a JMS queue configured on the replica site. Data is transmitted to one of the available engines (specified in the geo-remote-t3-url
element in sipserver.xml
) in a round-robin fashion. Engines at the secondary site monitor their local queue for new messages.
Upon receiving a message, an engine on the secondary site persists the call state data and assigns it the site ID value of the primary site. The site ID distinguishes replicated call state data on the secondary site from any other call state data actively managed by the secondary site. Timers in replicated call state data remain dormant on the secondary site, so that timer processing does not become a bottleneck to performance.
To perform a failover, the Administrator must change a global load balancer policy to begin routing calls from the primary, failed site to the secondary site. After this process is completed, the secondary site begins processing requests for the backed-up call state data. When a request is made for data that has been replicated from the failed site, the engine retrieves the data and activates the call state, taking ownership for the call. The activation process involves:
By default, call states are activated only for individual calls, and only after those calls are requested on the backup site. SipServerRuntimeMBean
includes a method, activateBackup(byte site)
, that can be used to force a site to take over all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script. Alternatively, an application deployed on the server can detect when a request for replicated site data occurs, and then execute the method. Listing 5-1 shows sample code from a JSP that activates a secondary site, changing ownership of all call state data replicated from site 1. Similar code could be used within a deployed Servlet. Note that either a JSP or Servlet must run as a privileged user in order to execute the activateBackup
method.
In order to detect whether a particular call state request, Servlets can use the WlssSipApplicationSession.getGeoSiteId()
method to examine the site ID associated with a call. Any non-zero value for the site ID indicates that the Servlet is working with call state data that was replicated from another site.
<%
byte site = 1;
InitialContext ctx = new InitialContext();
MBeanServer server = (MBeanServer) ctx.lookup("java:comp/env/jmx/runtime");
Set set = server.queryMBeans(new ObjectName("*:*,Type=SipServerRuntime"), null);
if (set.size() == 0) {
throw new IllegalStateException("No MBeans Found!!!");
}
ObjectInstance oi = (ObjectInstance) set.iterator().next();
SipServerRuntimeMBean bean = (SipServerRuntimeMBean)
MBeanServerInvocationHandler.newProxyInstance(server,
oi.getObjectName());
bean.activateBackup(site);
%>
Note that after a failover, the load balancer must route all calls having the same callId to the newly-activated site. Even if the original, failed site is restored to service, the load balancer must not partition calls between the two geographical sites.
You may also choose to stop replicating call states to a remote site in order to perform maintenance on the remote site or to change the backup site entirely. Replication can be stopped by setting the Site Handling attribute to “none” on the primary site as described in Configuring Persistence Options (Primary and Secondary Sites).
After disabling geographical replication on the primary site, you also may want to remove backup call states on the secondary site. SipServerRuntimeMBean
includes a method, deleteBackup(byte site)
, that can be used to force a site to remove all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script or via an application deployed on the secondary site. The steps for executing this method are similar to those for using the activateBackup
method, described in
Call State Processing After Failover.
The ReplicaRuntimeMBean
includes two new methods to retrieve data about geographically-redundant replication:
See the
JavaDoc for more information about ReplicaRuntimeMBean
.
In addition to using the ReplicaRuntimeMBean
methods described in
Monitoring Replication Across Regional Sites, Administrators should monitor any SNMP traps that indicate failed database writes on a secondary site installation.
Administrators must also ensure that all sites participating in geographically-redundant configurations use unique site IDs.