Configuring Geographically- Redundant Installations

The following sections describe how to replicate call state transactions across multiple, regional WebLogic SIP Server installations (“sites”):

Overview of Geographic Persistence

The basic call state replication functionality available in the WebLogic SIP Server data tier provides excellent failover capabilities for a single site installation. However, the active replication performed within the data tier requires high network bandwidth in order to meet the latency performance needs of most production networks. This bandwidth requirement makes a single data tier cluster unsuitable for replicating data over large distances, such as from one regional data center to another.

WebLogic SIP Server’s geographic persistence feature enables you to replica call state transactions across multiple WebLogic SIP Server installations (multiple Administrative domains or “sites”). A geographically-redundant configuration minimizes dropped calls in the event of a catastrophic failure of an entire site, for example due to an extended, regional power outage.

When using geographic persistence, a single replica in the primary site places modified call state data on a distributed JMS queue. By default, data is placed on the queue only at SIP dialog boundaries. (A custom API is provided for application developers that want to replicate data using a finer granularity, as described in Using Persistence Hints in SIP Applications.) In a secondary site, engine tier servers use a message listener to monitor the distributed queue to receive messages and write the data to its own data tier cluster. If the secondary site uses an RDBMS to store long-lived call states (recommended), then all data writes from the distribute queue go directly to the RDBMS, rather than to the in-memory storage of the data tier.

Example Domain Configurations

A secondary WebLogic SIP Server domain that persists data from another domain may itself process SIP traffic, or it may exist solely as an active standby domain. In the most common configuration, two sites are configured to replicate each other’s call state data, with each site processing its own local SIP traffic. The administrator can then use either domain as the “secondary” site should one of domains fail.

An alternate configuration utilizes a single domain that persists data from multiple, other sites, acting as the secondary for those sites. Although the secondary site in this configuration can also process its own, local SIP traffic, keep in mind that the resource requirements of the site may be considerable because of the need to persist active traffic from several other installations.

Requirements and Limitations

WebLogic SIP Server’s geographically-redundant persistence feature is most useful for sites that manage long-lived call state data in an RDBMS. Short-lived calls may be lost in the transition to a secondary site, because WebLogic SIP Server may choose to collect data for multiple call states before replicating between sites.

You must have a reliable, site-aware load balancing solution that can partition calls between geographic locations, as well as monitor the health of a given regional site. WebLogic SIP Server provides no automated functionality for detecting the failure of an entire domain, or for failing over to a secondary site. It is the responsibility of the Administrator to determine when a given site has “failed,” and to redirect that site’s calls to the correct secondary site. Furthermore, the site-aware load balancer must direct all messages for a given callId to a single home site (the “active” site). If, after a failover, the failed site is restored, the load balancer must continue directing calls to the active site and not partition calls between the two sites.

During a failover to a secondary site, some calls may be dropped. This can occur because WebLogic SIP Server generally queues call state data for site replication only at SIP dialog boundaries. Failures that occur before the data is written to the queue result in the loss of the queued data.

Also, WebLogic SIP Server replicates call state data across sites only when a SIP dialog boundary changes the call state. If a long-running call exists on the primary site before the secondary site is started, and the call state remains unmodified, that call’s data is not replicated to the secondary site. Should a failure occur before a long-running call state has been replicated, the call is lost during failover.

When planning for the capacity of a WebLogic SIP Server installation, keep in mind that, after a failover, a given site must be able to support all of the calls from the failed site as well as from its own geographic location. Essentially this means that all sites that are involved in a geographically-redundant configuration will operate at less than maximum capacity until a failover occurs.

Steps for Configuring Geographic Persistence

In order to use the WebLogic SIP Server geographic persistence features, you must perform certain configuration tasks on both the primary “home” site and on the secondary replication site. Table 5-1

Table 5-1 Steps for Configuring Geographic Persistence
Steps for Primary “Home” Site	Steps for Secondary “Replication” Site:
Install WebLogic SIP Server software and create replicated domain. Enable RDBMS storage for long-lived call states (recommended). Configure persistence options to: Define the unique regional site ID. Identify the secondary site’s URL. Enable replication hints.	Install WebLogic SIP Server software and create replicated domain. Enable RDBMS storage for long-lived call states (recommended). Configure JMS Servers and modules required for replicating data. Configure persistence options to: Define the unique regional site ID.

If you have an existing WebLogic SIP Server domain and want to use geographic persistence, follow the instructions in Configuring Geographical Redundancy by Hand to create the resources.

Using the Configuration Wizard Templates for Geographic Persistence

WebLogic SIP Server provides two Configuration Wizard templates for using geographic persistence features:

WLSS_HOME/common/templates/domains/geo1domain.jar configures a primary site having a site ID of 1. The domain replicates data to the engine tier servers created in geo2domain.jar.
WLSS_HOME/common/templates/domains/geo2domain.jar configures a secondary site that replicates call state data from the domain created with geo1domain.jar. This installation has site ID of 2.

The server port numbers in both domain templates are unique, so you can test geographic persistence features on a single machine if necessary. Follow the instructions in the sections that follow to install and configure each domain.

Installing and Configuring the Primary Site

Start the Configuration Wizard application:

cd ~/bea/sipserver30	/common/bin

./config.sh

Accept the default selection, Create a new WebLogic domain, and click Next.
Select Base this domain on an existing template, and click Browse to display the Select a Template dialog.
Select the template named geo1domain.jar, and click OK.
Click Next.
Enter the username and password for the Administrator of the new domain, and click Next.
Select a JDK to use, and click Next.
Select No to keep the settings defined in the source template file, and click Next.
Click Create to create the domain.

The template creates a new domain with two engine tier servers in a cluster, two data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:

A JDBC datasource, wlss.callstate.datasource, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Modify the JDBC Datasource Connection Information.
A persistence configuration (shown in the SipServer node, Configuration->Persistence tab of the Administration Console) that defines:

Default handling of persistence hints for both RDBMS and geographic persistence.
A Geo Site ID of 1.
A Geo Remote T3 URL of t3://localhost:8011,localhost:8061, which identifies the engine tier servers in the “geo2” domain as the replication site for geographic redundancy.

Click Done to exit the configuration wizard.
Follow the steps under Installing the Secondary Site to create the domain that performs the replication.

Installing the Secondary Site

Follow these steps to use a template to create a secondary site from replicating call state data from the “geo1” domain:

Start the Configuration Wizard application:

cd ~/bea/sipserver30	/common/bin

./config.sh

Accept the default selection, Create a new WebLogic domain, and click Next.
Select Base this domain on an existing template, and click Browse to display the Select a Template dialog.
Select the template named geo2domain.jar, and click OK.
Click Next.
Enter the username and password for the Administrator of the new domain, and click Next.
Select a JDK to use, and click Next.
Select No to keep the settings defined in the source template file, and click Next.
Click Create to create the domain.

A JDBC datasource, wlss.callstate.datasource, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Modify the JDBC Datasource Connection Information.
A persistence configuration (shown in the SipServer node, Configuration->Persistence tab of the Administration Console) that defines:

Default handling of persistence hints for both RDBMS and geographical redundancy.
A Geo Site ID of 2.

A JMS system module, SystemModule-Callstate, that includes:

ConnectionFactory-Callstate, a connection factory required for backing up call state data from a primary site.
DistributedQueue-Callstate, a uniform distributed queue required for backing up call state data from a primary site.

The JMS system module is targeted to the site’s engine tier cluster

Two JMS Servers, JMSServer-1 and JMSServer-2, are deployed to engine1-site2 and engine2-site2, respectively.

Click Done to exit the configuration wizard.

Configuring Geographical Redundancy by Hand

If you have an existing replicated WebLogic SIP Server installation, or pair of installations, you must create by hand the JMS and JDBC resources required for enabling geographical redundancy. You must also configure each site to perform replication. These basic steps for enabling geographical redundancy are:

Configure JDBC Resources. BEA recommends configuring both the primary and secondary sites to store long-lived call state data in an RDBMS.
Configure Persistence Options. Persistence options must be configured on both the primary and secondary sites to enable engine tier hints to write to an RDBMS or to replicate data to a geographically-redundant installation.
Configure JMS Resources. A secondary site must have available JMS Servers and specific JMS module resources in order to replicate call state data from another site.

Configuring JDBC Resources (Primary and Secondary Sites)

Configuring Persistence Options (Primary and Secondary Sites)

Both the primary and secondary sites must configure the correct persistence settings in order to enable replication for geographical redundancy. Follow these steps to configure persistence:

Use your browser to access the URL http://address:port/console where address is the Administration Server’s listen address and port is the listen port.
Click Lock & Edit to obtain a configuration lock.
Select the SipServer node in the left pane. The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring WebLogic SIP Server.
Select the Configuration->Persistence tab in the right pane.
Configure the Persistence attributes as follows:

Default Handling: Select “all” to persist long-lived call state data to an RDBMS and to replicate data to an external site for geographical redundancy (recommended). If your installation does not store call state data in an RDBMS, select “geo” instead of “all.”
Geo Site ID: Enter a unique number from 1 to 9 to distinguish this site from all other configured sites. Note that the site ID of 0 is reserved to indicate call states that are local to the site in question (call states not replicated from another site).
Geo Remote T3 URL: For primary sites (or for secondary sites that replicate their own data to another site), enter the T3 URL or URLs of the engine tier servers that will replicate this site’s call state data. If the secondary engine tier cluster uses a cluster address, you can enter a single T3 URL, such as t3://mycluster:7001. If the secondary engine tier cluster does not use a cluster address, enter the URLs for each individual engine tier server separated by a comma, such as t3://engine1-east-coast:7001,t3://engine2-east-coast:7002,t3://engine3-east-coast:7001,t4://engine4-east-coast:7002.

Click Save to save your configuration changes.
Click Activate Changes to apply your changes to the engine tier servers.

Configuring JMS Resources (Secondary Site Only)

Any site that replicates call state data from another site must configure certain required JMS resources. The resources are not required for sites that do not replicate data from another site.

Use your browser to access the URL http://address:port/console where address is the Administration Server’s listen address and port is the listen port.
Click Lock & Edit to obtain a configuration lock.
Select the Services->Messaging->JMS Servers tab in the left pane.
Click New in the right pane.
Enter a unique name for the JMS Server or accept the default name. Click Next to continue.
In the Target list, select the name of a single engine tier server node in the installation. Click Finish to create the new Server.
Repeat Steps 3-6 for to create a dedicated JMS Server for each engine tier server node in your installation.
Select the Services->Messaging->JMS Modules node in the left pane.
Click New in the right pane.
Fill in the fields of the Create JMS System Module page as follows:

Name: Enter a name for the new module, or accept the default name.
Descriptor File Name: Enter the prefix a configuration file name in which to store the JMS module configuration (for example, systemmodule-callstate).

Click Next to continue.
Select the name of the engine tier cluster, and choose the option All servers in the cluster.
Click Next to continue.
Select Would you like to add resources to this JMS system module and click Finish to create the module.
Click New to add a new resource to the module.
Select the Connection Factory option and click Next.
Fill in the fields of the Create a new JMS System Module Resource as follows:

Name: Enter a descriptive name for the resource, such as ConnectionFactory-Callstate.
JNDI Name: Enter the name wlss.callstate.backup.site.connection.factory.

Click Next to continue.
Click Finish to save the new resource.
Select the name of the connection factory resource you just created.
Select the Configuration->Load Balance tab in the right pane.
De-select the Server Affinity Enabled option, and click Save.
Re-select the Services->Messaging->JMS Modules node in the left pane.
Select the name of the JMS module you created in the right pane.
Click New to create another JMS resource.
Select the Distributed Queue option and click Next.
Fill in the fields of the Create a new JMS System Module Resource as follows:

Name: Enter a descriptive name for the resource, such as DistributedQueue-Callstate.

JNDI Name: Enter the name Fill in the fields of the Create a new JMS System Module Resource as follows:

Name: Enter a descriptive name for the resource, such as ConnectionFactory-Callstate.
JNDI Name: Enter the name wlss.callstate.backup.site.queue.

Click Next to continue.
Click Finish to save the new resource.
Click Save to save your configuration changes.
Click Activate Changes to apply your changes to the engine tier servers.

Understanding Geo-Redundant Replication Behavior

This section provides more detail into how multiple sites replicate call state data. Administrators can use this information to better understand the mechanics of geo-redundant replication and to better troubleshoot any problems that may occur in such a configuration. Note, however, that the internal workings of replication across WebLogic SIP Server installations is subject to change in future releases of the product.

Call State Replication Process

When a call is initiated on a primary WebLogic SIP Server site, call setup and processing occurs normally. When a SIP dialog boundary is reached, the call is replicated (in-memory) to the site’s data tier, and becomes eligible for replication to a secondary site. WebLogic SIP Server may choose to aggregate multiple call states for replication in order to optimize network usage.

A single replica in the data tier then places the call state data to be replicated on a JMS queue configured on the replica site. Data is transmitted to one of the available engines (specified in the geo-remote-t3-url element in sipserver.xml) in a round-robin fashion. Engines at the secondary site monitor their local queue for new messages.

Upon receiving a message, an engine on the secondary site persists the call state data and assigns it the site ID value of the primary site. The site ID distinguishes replicated call state data on the secondary site from any other call state data actively managed by the secondary site. Timers in replicated call state data remain dormant on the secondary site, so that timer processing does not become a bottleneck to performance.

Call State Processing After Failover

To perform a failover, the Administrator must change a global load balancer policy to begin routing calls from the primary, failed site to the secondary site. After this process is completed, the secondary site begins processing requests for the backed-up call state data. When a request is made for data that has been replicated from the failed site, the engine retrieves the data and activates the call state, taking ownership for the call. The activation process involves:

By default, call states are activated only for individual calls, and only after those calls are requested on the backup site. SipServerRuntimeMBean includes a method, activateBackup(byte site), that can be used to force a site to take over all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script. Alternatively, an application deployed on the server can detect when a request for replicated site data occurs, and then execute the method. Listing 5-1 shows sample code from a JSP that activates a secondary site, changing ownership of all call state data replicated from site 1. Similar code could be used within a deployed Servlet. Note that either a JSP or Servlet must run as a privileged user in order to execute the activateBackup method.

In order to detect whether a particular call state request, Servlets can use the WlssSipApplicationSession.getGeoSiteId() method to examine the site ID associated with a call. Any non-zero value for the site ID indicates that the Servlet is working with call state data that was replicated from another site.

Note that after a failover, the load balancer must route all calls having the same callId to the newly-activated site. Even if the original, failed site is restored to service, the load balancer must not partition calls between the two geographical sites.

Removing Backup Call States

You may also choose to stop replicating call states to a remote site in order to perform maintenance on the remote site or to change the backup site entirely. Replication can be stopped by setting the Site Handling attribute to “none” on the primary site as described in Configuring Persistence Options (Primary and Secondary Sites).

After disabling geographical replication on the primary site, you also may want to remove backup call states on the secondary site. SipServerRuntimeMBean includes a method, deleteBackup(byte site), that can be used to force a site to remove all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script or via an application deployed on the secondary site. The steps for executing this method are similar to those for using the activateBackup method, described in Call State Processing After Failover.

Monitoring Replication Across Regional Sites

The ReplicaRuntimeMBean includes two new methods to retrieve data about geographically-redundant replication:

Troubleshooting Geographical Replication

In addition to using the ReplicaRuntimeMBean methods described in Monitoring Replication Across Regional Sites, Administrators should monitor any SNMP traps that indicate failed database writes on a secondary site installation.

Administrators must also ensure that all sites participating in geographically-redundant configurations use unique site IDs.

Configuration Guide