6 Configuring SIP Data Tier Partitions and Replicas

The following sections describe how to configure Oracle WebLogic Communication Services instances that make up the SIP data tier cluster of a deployment:

Section 6.1, "Overview of SIP Data Tier Configuration"
Section 6.2, "Best Practices for Configuring and Managing SIP Data Tier Servers"
Section 6.3, "Example SIP Data Tier Configurations and Configuration Files"
Section 6.4, "Storing Long-Lived Call State Data In A RDBMS"
Section 6.5, "Introducing Geo-Redundancy"
Section 6.6, "Using Geographically-Redundant SIP Data Tiers"
Section 6.7, "Caching SIP Data in the Engine Tier"
Section 6.8, "Monitoring and Troubleshooting SIP Data Tier Servers"

6.1 Overview of SIP Data Tier Configuration

The Oracle WebLogic Communication Services SIP data tier is a cluster of server instances that manages the application call state for concurrent SIP calls. The SIP data tier may manage a single copy of the call state or multiple copies as needed to ensure that call state data is not lost if a server machine fails or network connections are interrupted.

The SIP data tier cluster is arranged into one or more partitions. A partition consists of one or more SIP data tier server instances that manage the same portion of concurrent call state data. In a single-server Oracle WebLogic Communication Services installation, or in a two-server installation where one server resides in the engine tier and one resides in the SIP data tier, all call state data is maintained in a single partition. Multiple partitions are required when the size of the concurrent call state exceeds the maximum size that can be managed by a single server instance. When more than one partition is used, the concurrent call state is split among the partitions, and each partition manages an separate portion of the data. For example, with a two-partition SIP data tier, one partition manages the call state for half of the concurrent calls (for example, calls A through M) while the second partition manages the remaining calls (N through Z).

In most cases, the maximum call state size that can be managed by an individual server corresponds to the Java Virtual Machine limit of approximately 1.6GB per server.

Additional servers can be added within the same partition to manage copies of the call state data. When multiple servers are members of the same partition, each server manages a copy of the same portion of the call data, referred to as a replica of the call state. If a server in a partition fails or cannot be contacted due to a network failure, another replica in the partition supplies the call state data to the engine tier. Oracle recommends configuring two servers in each partition for production installations, to guard against machine or network failures. A partition can have a maximum of three replicas for providing additional redundancy.

6.1.1 datatier.xml Configuration File

The datatier.xml configuration file, located in the config/custom subdirectory of the domain directory, identifies SIP data tier servers and also defines the partitions and replicas used to manage the call state. If a server's name is present in datatier.xml, that server loads Oracle WebLogic Communication Services SIP data tier functionality at boot time. (Server names that do not appear in datatier.xml act as engine tier nodes, and instead provide SIP Servlet container functionality configured by the sipserver.xml configuration file.)

The sections that follow show examples of the datatier.xml contents for common SIP data tier configurations.

6.1.2 Configuration Requirements and Restrictions

All servers that participate in the SIP data tier should be members of the same WebLogic Server cluster. The cluster configuration enables each server to monitor the status of other servers. Using a cluster also enables you to easily target the sipserver and datatier custom resources to all servers for deployment.

For high reliability, you can configure up to three replicas within a partition.

You cannot change the SIP data tier configuration while replicas or engine tier nodes are running. You must restart servers in the domain in order to change SIP data tier membership or reconfigure partitions or replicas.

You can view the current SIP data tier configuration (and configure the data tier) using the Configuration > Data Tier page (SipServer node) of the Administration Console, as shown in Figure 6-1.

Figure 6-1 Administration Console Display of SIP Data Tier Configuration (Read-Only)

Description of "Figure 6-1 Administration Console Display of SIP Data Tier Configuration (Read-Only)"

6.2 Best Practices for Configuring and Managing SIP Data Tier Servers

Adding replicas can increase reliability for the system as a whole, but keep in mind that each additional server in a partition requires additional network bandwidth to manage the replicated data. With three replicas in a partition, each transaction that modifies the call state updates data on three different servers.

To ensure high reliability when using replicas, always ensure that server instances in the same partition reside on different machines. Hosting two or more replicas on the same machine leaves all of the hosted replicas vulnerable to a machine or network failure.

SIP data tier servers can have one of three different statuses:

ONLINE—indicates that the server is available for managing call state transactions.
OFFLINE—indicates that the server is shut down or unavailable.
ONLINE_LOCK_AUTHORITY_ONLY—indicates that the server was rebooted and is currently being updated (from other replicas) with the current call state data. A recovering server cannot yet process call state transactions, because it does not maintain a full copy of the call state managed by the partition.

If you need to take a SIP data tier server instance offline for scheduled maintenance, make sure that at least one other server in the same partition is active. If you shut down an active server and all other servers in the partition are offline or recovering, you will lose a portion of the active call state.

Oracle WebLogic Communication Services automatically divides the call state evenly over all configured partitions.

6.3 Example SIP Data Tier Configurations and Configuration Files

The sections that follow describe some common Oracle WebLogic Communication Services installations that utilize a separate SIP data tier.

6.3.1 SIP Data Tier with One Partition

A single-partition, single-server SIP data tier represents the simplest data tier configuration. Example 6-1 shows a SIP data tier configuration for a single-server deployment.

Example 6-1 SIP Data Tier Configuration for Small Deployment

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http://www.bea.com/ns/wlcp/wlss/300">
    <partition>
      <name>part-1</name>
      <server-name>replica1</server-name>
    </partition>
  </data-tier>

To add a replica to an existing partition, simply define a second server-name entry in the same partition. For example, the datatier.xml configuration file shown in Example 6-2 recreates a two-replica configuration.

Example 6-2 SIP Data Tier Configuration for Small Deployment with Replication

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http://www.bea.com/ns/wlcp/wlss/300">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
      <server-name>DataNode0-1</server-name>
    </partition>
  </data-tier>

6.3.2 SIP Data Tier with Two Partitions

Multiple partitions can be easily created by defining multiple partition entries in datatier.xml, as shown in Example 6-3.

Example 6-3 Two-Partition SIP Data Tier Configuration

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http://www.bea.com/ns/wlcp/wlss/300">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
    </partition>
    <partition>
      <name>Partition1</name>
      <server-name>DataNode1-0</server-name>
    </partition>
  </data-tier>

6.3.3 SIP Data Tier with Two Partitions and Two Replicas

Replicas of the call state can be added by defining multiple SIP data tier servers in each partition. Example 6-4 shows the datatier.xml configuration file used to define a system having two partitions with two servers (replicas) in each partition.

Example 6-4 SIP Data Tier Configuration for Small Deployment

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http://www.bea.com/ns/wlcp/wlss/300">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
      <server-name>DataNode0-1</server-name>
    </partition>
    <partition>
      <name>Partition1</name>
      <server-name>DataNode1-0</server-name>
      <server-name>DataNode1-1</server-name>
    </partition>
  </data-tier>

6.4 Storing Long-Lived Call State Data In A RDBMS

Oracle WebLogic Communication Services enables you to store long-lived call state data in an Oracle or MySQL RDBMS in order to conserve RAM. When you enable RDBMS persistence, by default the SIP data tier persists a call state's data to the RDBMS after the call dialog has been established, and at subsequent dialog boundaries, retrieving or deleting the persisted call state data as necessary to modify or remove the call state.

Oracle also provides an API for application designers to provide "hints" as to when the SIP data tier should persist call state data. These hints can be used to persist call state data to the RDBMS more frequently, or to disable persistence for certain calls.

Note that Oracle WebLogic Communication Services only uses the RDBMS to supplement the SIP data tier's in-memory replication functionality. To improve latency performance when using an RDBMS, the SIP data tier maintains SIP timers in memory, along with call states being actively modified (for example, in response to a new call being set up). Call states are automatically persisted only after a dialog has been established and a call is in progress, at subsequent dialog boundaries, or in response to persistence hints added by the application developer.

When used in conjunction with an RDBMS, the SIP data tier selects one replica server instance to process all call state writes (or deletes) to the database. Any available replica can be used to retrieve call states from the persistent store as necessary for subsequent reads.

RDBMS call state storage can be used in combination with an engine tier cache, if your domain uses a SIP-aware load balancer to manage connections to the engine tier. See Section 6.7, "Caching SIP Data in the Engine Tier".

6.4.1 Requirements and Restrictions

Enable RDBMS call state storage only when all of the following criteria are met:

The call states managed by your system are typically long-lived.
The size of the call state to be stored is large. Very large call states may require a significant amount of RAM in order to store the call state.
Latency performance is not critical to your deployed applications.

The latency requirement, in particular, must be well understood before choosing to store call state data in an RDBMS. The RDBMS call state storage option measurably increases latency for SIP message processing, as compared to using a SIP data tier cluster. If your system must handle a large number of short-lived SIP transactions with brief response times, Oracle recommends storing all call state data in the SIP data tier.

Note:

RDBMS persistence is designed only to reduce the RAM requirements in the SIP data tier for large, long-lived call states. The persisted data cannot be used to restore a failed SIP data tier partition or replica.

6.4.2 Steps for Enabling RDBMS Call State Storage

In order to use the RDBMS call state storage feature, your Oracle WebLogic Communication Services domain must include the necessary JDBC configuration, SIP Servlet container configuration, and a database having the schema required to store the call state. You can automate much of the required configuration by using the Configuration Wizard to set up a new domain with the RDBMS call state template. See Section 6.4.3, "Using the Configuration Wizard RDBMS Store Template".

If you have an existing Oracle WebLogic Communication Services domain, or you want to configure the RDBMS store on your own, see Section 6.4.4, "Configuring RDBMS Call State Storage by Hand" for instructions to configure JDBC and Oracle WebLogic Communication Services to use an RDBMS store.

6.4.3 Using the Configuration Wizard RDBMS Store Template

The Configuration Wizard provides a simple template that helps you easily begin using and testing the RDBMS call state store. Follow these steps to create a new domain from the template:

Start the Configuration Wizard application (config.sh)
Accept the default selection, Create a new WebLogic domain, and click Next.
Select Base this domain on an existing template, and click Browse to display the Select a Template dialog.
Select the template named replicateddomain.jar, and click OK.
Click Next.
Enter the username and password for the Administrator of the new domain, and click Next.
Select a JDK to use, and click Next.
Select No to keep the settings defined in the source template file, and click Next.
Click Create to create the domain.

The template creates a new domain with two engine tier servers in a cluster, two SIP data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:
- A JDBC datasource, wlss.callstate.datasource, required for storing long-lived call state data. Note that you must modify this configuration to configure the datasource for your own RDBMS server. See Section 6.4.3.1, "Modify the JDBC Datasource Connection Information".
- A persistence configuration (shown in the SipServer node, Configuration > Persistence tab of the Administration Console) that defines default handling of persistence hints for both RDBMS and geographical redundancy.
Click Done to exit the configuration wizard.
Follow the steps under Section 6.4.3.1, "Modify the JDBC Datasource Connection Information" to create the necessary tables in your RDBMS.
Follow the steps under Section 6.4.4.3, "Create the Database Schema" to create the necessary tables in your RDBMS.

6.4.3.1 Modify the JDBC Datasource Connection Information

After installing the new domain, modify the template JDBC datasource to include connection information for your RDBMS server:

Use your browser to access the URL http://address:port/console where address is the Administration Server's listen address and port is the listen port.
Select the Services > JDBC > Data Sources tab in the left pane.
Select the data source named wlss.callstate.datasource in the right pane.
Select the Configuration > Connection Pool tab in the right pane.
Modify the following connection pool properties:
- URL: Modify the URL to specify the host name and port number of your RDBMS server.
- Properties: Modify the value of the user, portNumber, SID, and serverName properties to match the connection information for your RDBMS.
- Password and Confirm Password: Enter the password of the RDBMS user you specified.
Click Save to save your changes.
Select the Targets tab in the right pane.
On the Select Targets page, select the name of your SIP data tier cluster (for example, BEA_DATA_TIER_CLUST), then click Save.
Click Save.
Follow the steps under Section 6.4.4.3, "Create the Database Schema" to create the necessary tables in your RDBMS.

6.4.4 Configuring RDBMS Call State Storage by Hand

To change an existing Oracle WebLogic Communication Services domain to store call state data in an Oracle or MySQL RDBMS, you must configure the required JDBC datasource, edit the Oracle WebLogic Communication Services configuration, and add the required schema to your database. Follow the instructions in the sections below to configure an Oracle Database.

6.4.4.1 Configure JDBC Resources

Follow these steps to create the required JDBC resources in your domain:

Boot the Administration Server for the domain if it is not already running.
Access the Administration Console for the domain.
Select the Services > JDBC > Data Sources tab in the left pane.
Click New to create a new data source.
Fill in the fields of the Create a New JDBC Data Source page as follows:
- Name: Enter wlss.callstate.datasource
- JNDI Name: Enter wlss.callstate.datasource.
- Database Type: Select "Oracle."
- Database Driver: Select an appropriate JDBC driver from the Database Driver list. Note that some of the drivers listed in this field may not be installed by default on your system. Install third-party drivers as necessary using the instructions from your RDBMS vendor.
Click Next.
Fill in the fields of the Connection Properties tab using connection information for the database you wan to use. Click Next to continue.
Click Test Configuration to test your connection to the RDBMS, or click Next to continue.
On the Select Targets page, select the name of your SIP data tier cluster (for example, BEA_DATA_TIER_CLUST).
Click Finish to save your changes.

6.4.4.2 Configure Oracle WebLogic Communication Services Persistence Options

Follow these steps to configure the Oracle WebLogic Communication Services persistence options to use an RDBMS call state store:

Boot the Administration Server for the domain if it is not already running.
Access the Administration Console for the domain.
Select the SipServer node in the left pane.
Select the Configuration > Persistence tab in the right pane.
In the Default Handling drop-down menu, select either "db" or "all." It is acceptable to select "all" because geographically-redundant replication is only performed if the Geo Site ID and Geo Remote T3 URL fields have been configured.
Click Save to save your changes.

6.4.4.3 Create the Database Schema

Oracle WebLogic Communication Services includes a SQL script, callstate.sql, that you can use to create the tables necessary for storing call state information. The script is installed to the user_staged_config subdirectory of the domain directory when you configure a replicated domain using the Configuration Wizard. The script is also available in the WLSS_HOME/common/templates/scripts/db/oracle directory.

The contents of the callstate.sql SQL script are shown in Example 6-5.

Example 6-5 callstate.sql Script for Call State Storage Schema

drop table callstate;

create table callstate (
  key1 int,
  key2 int,
  bytes blob default empty_blob(),
  constraint pk_callstate primary key (key1, key2)
);

Follow these steps to execute the script commands using SQL*Plus:

Move to the Oracle WebLogic Communication Services utils directory, in which the SQL Script is stored:
```
cd ~/bea/wlcserver_10.3/common/templates/scripts/db/oracle
```
Start the SQL*Plus application, connecting to the Oracle database in which you will create the required tables. Use the same username, password, and connect to the same database that you specified when configuring the JDBC driver in Section 6.4.4.1, "Configure JDBC Resources". For example:
```
sqlplus username/password@connect_identifier
```
where connect_identifier connects to the database identified in the JDBC connection pool.
Execute the Oracle WebLogic Communication Services SQL script, callstate.sql:
```
START callstate.sql
```
Exit SQL*Plus:
```
EXIT
```

6.4.5 Using Persistence Hints in SIP Applications

Oracle WebLogic Communication Services provides a simple API to provide "hints" as to when the SIP data tier should persist call state data. You can use the API to disable persistence for specific calls or SIP requests, or to persist data more frequently than the default setting (at SIP dialog boundaries).

To use the API, simply obtain a WlssSipApplicationSession instance and use the setPersist method to enable or disable persistence. Note that you can enable or disable persistence either to an RDBMS store, or to as geographically-redundant Oracle WebLogic Communication Services installation (see Section 6.6, "Using Geographically-Redundant SIP Data Tiers").

For example, some SIP-aware load balancing products use the SIP OPTIONS message to determine if a SIP Server is active. To avoid persisting these messages to an RDBMS and to a geographically-redundant site, a Servlet might implement a doOptions method to echo the request and turn off persistence for the message, as shown in Example 6-6.

Example 6-6 Disabling RDBMS Persistence for Option Methods

protected void doOptions(SipServletRequest req) throws IOException {
    WlssSipApplicationSession session =
      (WlssSipApplicationSession) req.getApplicationSession();
    session.setPersist(WlssSipApplicationSession.PersistenceType.DATABASE,
      false);
    session.setPersist(WlssSipApplicationSession.PersistenceType.GEO_REDUNDANCY, false);
    req.createResponse(200).send();
}

6.5 Introducing Geo-Redundancy

Geo-Redundancy ensures uninterrupted transactions and communications for providers, using geographically-separated SIP server deployments.

A primary site can process various SIP transactions and communications and upon determining a transaction boundary, replicate the state data associated with the transaction being processed, to a secondary site. Upon failure of the primary site, calls are routed from the failed primary site to a secondary site for processing. Similarly, upon recovery, the calls are re-routed back to the primary site.

Figure 6-2 Geo-Redundancy

Description of "Figure 6-2 Geo-Redundancy"

In the preceeding figure, Geo-Redundancy is portrayed. The process proceeds in this manner:

Call is initiated on a primary OCCAS Cluster site, call setup and processing occurs normally.
Call is replicated as usual to the site's SIP State Tier, and becomes eligible for replication to a secondary site.
A single replica in the SIP State Tier then places the call state data to be replicated on a JMS queue configured.
Call is transmitted to one of the available engines using JMS over WAN.
Engines at the secondary site monitor their local queue for new messages. Upon receiving a message, an Engine in the secondary site OCCAS Cluster persists the call state data and assigns it the site ID value of the primary site.

Table 6-1 Geographic Redundancy flow

Normal Operation	failover
When a session is initiated on a primary OCCAS site, call setup and processing occurs normally.	Global LB policy updated to begin routing calls - primary site to secondary site.
When a SIP transaction boundary is reached, the call is replicated (in-memory) to the site's data tier, and becomes eligible for replication to a secondary site.	Once complete, the secondary site begins processing requests for the backed-up call state data.
A single replica in the data tier then places the call state data to be replicated on a JMS queue configured on the replica site.	When a requests hit secondary site engine retrieves the data and activates the call state, taking ownership for the call.
Data is transmitted to one of the available engines round-robin fashion.	Sets the site ID associated with the call to zero (making it appear local).
Engines at the secondary site monitor their local queue for new messages.	Activates all dormant timers present in the call state.
Upon receiving a message, an engine on the secondary site persists the call state data and assigns it the site ID value of the primary site.	By default, call states are activated only for individual calls, and only after those calls are requested on the backup site.
The site ID distinguishes replicated call state data on the secondary site from any other call state data actively managed by the secondary site.	Servlets can use the WlssSipApplicationSession.getGeoSiteId() method to examine the site ID associated with a call.
Timers in replicated call state data remain dormant on the secondary site, so that timer processing does not become a bottleneck to performance.	Any non-zero value for the site ID indicates that the Servlet is working with call state data that was replicated from another site.

6.5.1 Situations Best Suited to Use Geo-Redundancy

The following situations are best suited to take advantage of Geo-Redundancy:

Your application uses SIP dialog states that are long-lived (dialog states that typically last 30 seconds or longer, such as SUBSCRIBE dialogs or conferences)
Your application would reasonably be able to reconstruct the session (re-INVITE, expire SUBSCRIBE dialogs to trigger re-subscriptions, and so on) from the state that has been replicated
The link between two OCCAS clusters or sites is low-bandwidth (<1Gb/s each direction) or high (or variable) latency (>5ms 95%)

6.5.2 Situations Not Suited to Use Geo-Redundancy

Geo-Redundancy should not be used in these situations:

A high-capacity link between sites is available
Your application does not reach SIP dialog steady-states that are likely to last longer than the time it would take to re-route all traffic to the secondary site in the event of catastrophic failure (15-30 seconds)
If the application session is likely to be terminated by the user before the application could re-construct the session (most users will disconnect their calls before the session can be re-established from the secondary site)
The volume of session state objects created by the application is greater than the site interconnect can support

6.5.3 Geo-Redundancy Considerations: Before Your Begin

Keep in mind the following considerations when planning your Geo-Redundancy:

Dimension the system for the site link!
Each dialog state is ~25KB on the wire (25600 bits)
A typical B2BUA is two (2) dialogs
Aim for 25% utilization (or less, depending on the specific equipment and topology of the site) to accommodate “jitter” and sustained latency on the link

For example, a 100 Mb/s link can handle approximately1000 call states per second, and a typical B2BUA (in the default configuration) generates 4 states during the call (two for each dialog). So, a 100 Mb/s link will support a single OWLCS cluster dimensioned for a peak arrival rate (call rate) of 250 CPS.
Geo-Redundancy is not transparent to the application; in most cases the application must be designed to use SetPersist() appropriately, and the developer must consider the volume of state that the application will queue for replication between sites
SetPersist() should be used within the application code to selectively identify dialog states that will be long-lived
Given the time it generally takes to route traffic to a secondary site, any application that replicates state more frequently will unnecessarily saturate the JMS queue and site interconnect
Tuning of JMS to the specific application environment is required: Serialization options, message batching, reliable delivery options and queue size are all variable, depending on the specific application and site characteristics
Geo-Redundancy default behavior is to replicate all dialog state changes when Geo-Redundancy is enabled for the container (this is not recommended for production deployments)
Given the time it generally takes to route traffic to a secondary site, any application that replicates state more frequently will unnecessarily saturate the site interconnect
SetPersist() should be used within the application code to selectively identify dialog states that will be long-lived (longer than ~20-30 seconds would be a reasonable threshold)

6.6 Using Geographically-Redundant SIP Data Tiers

The basic call state replication functionality available in the Oracle WebLogic Communication Services SIP data tier provides excellent failover capabilities for a single site installation. However, the active replication performed within the SIP data tier requires high network bandwidth in order to meet the latency performance needs of most production networks. This bandwidth requirement makes a single SIP data tier cluster unsuitable for replicating data over large distances, such as from one regional data center to another.

The Oracle WebLogic Communication Services geographic persistence feature enables you to replica call state transactions across multiple Oracle WebLogic Communication Services installations (multiple Administrative domains or "sites"). A geographically-redundant configuration minimizes dropped calls in the event of a catastrophic failure of an entire site, for example due to an extended, regional power outage.

Figure 6-3 Oracle WebLogic Communication Services Geographic Persistence

Description of "Figure 6-3 Oracle WebLogic Communication Services Geographic Persistence"

6.6.1 Example Domain Configurations

A secondary Oracle WebLogic Communication Services domain that persists data from another domain may itself process SIP traffic, or it may exist solely as an active standby domain. In the most common configuration, two sites are configured to replicate each other's call state data, with each site processing its own local SIP traffic. The administrator can then use either domain as the "secondary" site should one of domains fail.

Figure 6-4 Common Geographically-Redundant Configuration

Description of "Figure 6-4 Common Geographically-Redundant Configuration"

An alternate configuration utilizes a single domain that persists data from multiple, other sites, acting as the secondary for those sites. Although the secondary site in this configuration can also process its own, local SIP traffic, keep in mind that the resource requirements of the site may be considerable because of the need to persist active traffic from several other installations.

Figure 6-5 Alternate Geographically-Redundant Configuration

Description of "Figure 6-5 Alternate Geographically-Redundant Configuration"

When using geographic persistence, a single replica in the primary site places modified call state data on a distributed JMS queue. By default, data is placed on the queue only at SIP dialog boundaries. (A custom API is provided for application developers that want to replicate data using a finer granularity, as described in Section 6.4.5, "Using Persistence Hints in SIP Applications".) In a secondary site, engine tier servers use a message listener to monitor the distributed queue to receive messages and write the data to its own SIP data tier cluster. If the secondary site uses an RDBMS to store long-lived call states (recommended), then all data writes from the distribute queue go directly to the RDBMS, rather than to the in-memory storage of the SIP data tier.

6.6.2 Requirements and Limitations

The Oracle WebLogic Communication Services geographically-redundant persistence feature is most useful for sites that manage long-lived call state data in an RDBMS. Short-lived calls may be lost in the transition to a secondary site, because Oracle WebLogic Communication Services may choose to collect data for multiple call states before replicating between sites.

You must have a reliable, site-aware load balancing solution that can partition calls between geographic locations, as well as monitor the health of a given regional site. Oracle WebLogic Communication Services provides no automated functionality for detecting the failure of an entire domain, or for failing over to a secondary site. It is the responsibility of the Administrator to determine when a given site has "failed," and to redirect that site's calls to the correct secondary site. Furthermore, the site-aware load balancer must direct all messages for a given callId to a single home site (the "active" site). If, after a failover, the failed site is restored, the load balancer must continue directing calls to the active site and not partition calls between the two sites.

During a failover to a secondary site, some calls may be dropped. This can occur because Oracle WebLogic Communication Services generally queues call state data for site replication only at SIP dialog boundaries. Failures that occur before the data is written to the queue result in the loss of the queued data.

Also, Oracle WebLogic Communication Services replicates call state data across sites only when a SIP dialog boundary changes the call state. If a long-running call exists on the primary site before the secondary site is started, and the call state remains unmodified, that call's data is not replicated to the secondary site. Should a failure occur before a long-running call state has been replicated, the call is lost during failover.

When planning for the capacity of a Oracle WebLogic Communication Services installation, keep in mind that, after a failover, a given site must be able to support all of the calls from the failed site as well as from its own geographic location. Essentially this means that all sites that are involved in a geographically-redundant configuration will operate at less than maximum capacity until a failover occurs.

6.6.3 Steps for Configuring Geographic Persistence

In order to use the Oracle WebLogic Communication Services geographic persistence features, you must perform certain configuration tasks on both the primary "home" site and on the secondary replication site.

Table 6-2 Steps for Configuring Geographic Persistence

Steps for Primary "Home" Site	Steps for Secondary "Replication" Site:
Install Oracle WebLogic Communication Services software and create replicated domain. Enable RDBMS storage for long-lived call states (recommended). Configure persistence options to: define the unique regional site ID; identify the secondary site's URL; and enable replication hints.	Install Oracle WebLogic Communication Services software and create replicated domain. Enable RDBMS storage for long-lived call states (recommended). Configure JMS Servers and modules required for replicating data. Configure persistence options to define the unique regional site ID.

Note:

In most production deployments, two sites will perform replication services for each other, so you will generally configure each installation as both a primary and secondary site.

Oracle WebLogic Communication Services provides domain templates to automate the configuration of most of the resources described in Table 6-2. See Section 6.6.4, "Using the Configuration Wizard Templates for Geographic Persistence" for information about using the templates.

If you have an existing Oracle WebLogic Communication Services domain and want to use geographic persistence, follow the instructions in Section 6.6.5, "Manually Configuring Geographical Redundancy" to create the resources.

6.6.4 Using the Configuration Wizard Templates for Geographic Persistence

Oracle WebLogic Communication Services provides two Configuration Wizard templates for using geographic persistence features:

WLSS_HOME/common/templates/domains/geo1domain.jar configures a primary site having a site ID of 1. The domain replicates data to the engine tier servers created in geo2domain.jar.
WLSS_HOME/common/templates/domains/geo2domain.jar configures a secondary site that replicates call state data from the domain created with geo1domain.jar. This installation has site ID of 2.

The server port numbers in both domain templates are unique, so you can test geographic persistence features on a single machine if necessary. Follow the instructions in the sections that follow to install and configure each domain.

6.6.4.1 Installing and Configuring the Primary Site

Follow these steps to create a new primary domain from the template:

Start the Configuration Wizard application (config.sh).
Accept the default selection, Create a new WebLogic domain, and click Next.
Select Base this domain on an existing template, and click Browse to display the Select a Template dialog.
Select the template named geo1domain.jar, and click OK.
Click Next.
Enter the username and password for the Administrator of the new domain, and click Next.
Select a JDK to use, and click Next.
Select No to keep the settings defined in the source template file, and click Next.
Click Create to create the domain.

The template creates a new domain with two engine tier servers in a cluster, two SIP data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:
- A JDBC datasource, wlss.callstate.datasource, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Section 6.4.3.1, "Modify the JDBC Datasource Connection Information".
- A persistence configuration (shown in the SipServer node, Configuration > Persistence tab of the Administration Console) that defines:
  - Default handling of persistence hints for both RDBMS and geographic persistence.
  - A Geo Site ID of 1.
  - A Geo Remote T3 URL of t3://localhost:8011,localhost:8061, which identifies the engine tier servers in the "geo2" domain as the replication site for geographic redundancy.
Click Done to exit the configuration wizard.
Follow the steps under Section 6.6.4.2, "Installing the Secondary Site" to create the domain that performs the replication.

6.6.4.2 Installing the Secondary Site

Follow these steps to use a template to create a secondary site from replicating call state data from the "geo1" domain:

Start the Configuration Wizard application (config.sh).
Accept the default selection, Create a new WebLogic domain, and click Next.
Select Base this domain on an existing template, and click Browse to display the Select a Template dialog.
Select the template named geo2domain.jar, and click OK.
Click Next.
Enter the username and password for the Administrator of the new domain, and click Next.
Select a JDK to use, and click Next.
Select No to keep the settings defined in the source template file, and click Next.
Click Create to create the domain.

The template creates a new domain with two engine tier servers in a cluster, two SIP data tier servers in a cluster, and an Administration Server (AdminServer). The engine tier cluster includes the following resources and configuration:
- A JDBC datasource, wlss.callstate.datasource, required for storing long-lived call state data. If you want to use this functionality, edit the datasource to include your RDBMS connection information as described in Section 6.4.3.1, "Modify the JDBC Datasource Connection Information".
- A persistence configuration (shown in the SipServer node, Configuration > Persistence tab of the Administration Console) that defines:
  - Default handling of persistence hints for both RDBMS and geographical redundancy.
  - A Geo Site ID of 2.
- A JMS system module, SystemModule-Callstate, that includes:
  - ConnectionFactory-Callstate, a connection factory required for backing up call state data from a primary site.
  - DistributedQueue-Callstate, a uniform distributed queue required for backing up call state data from a primary site.
  The JMS system module is targeted to the site's engine tier cluster
- Two JMS Servers, JMSServer-1 and JMSServer-2, are deployed to engine1-site2 and engine2-site2, respectively.
Click Done to exit the configuration wizard.

6.6.5 Manually Configuring Geographical Redundancy

If you have an existing replicated Oracle WebLogic Communication Services installation, or pair of installations, you must create by hand the JMS and JDBC resources required for enabling geographical redundancy. You must also configure each site to perform replication. These basic steps for enabling geographical redundancy are:

Configure JDBC Resources. Oracle recommends configuring both the primary and secondary sites to store long-lived call state data in an RDBMS.
Configure Persistence Options. Persistence options must be configured on both the primary and secondary sites to enable engine tier hints to write to an RDBMS or to replicate data to a geographically-redundant installation.
Configure JMS Resources. A secondary site must have available JMS Servers and specific JMS module resources in order to replicate call state data from another site.

The sections that follow describe each step in detail.

6.6.5.1 Configuring JDBC Resources (Primary and Secondary Sites)

Follow the instructions in Section 6.4, "Storing Long-Lived Call State Data In A RDBMS" to configure the JDBC resources required for storing long-lived call states in an RDBMS.

6.6.5.2 Configuring Persistence Options (Primary and Secondary Sites)

Both the primary and secondary sites must configure the correct persistence settings in order to enable replication for geographical redundancy. Follow these steps to configure persistence:

Use your browser to access the URL http://address:port/console where address is the Administration Server's listen address and port is the listen port.
Click Lock & Edit to obtain a configuration lock.
Select the SipServer node in the left pane. The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Oracle WebLogic Communication Services.
Select the Configuration > Persistence tab in the right pane.
Configure the Persistence attributes as follows:
- Default Handling: Select "all" to persist long-lived call state data to an RDBMS and to replicate data to an external site for geographical redundancy (recommended). If your installation does not store call state data in an RDBMS, select "geo" instead of "all."
- Geo Site ID: Enter a unique number from 1 to 9 to distinguish this site from all other configured sites. Note that the site ID of 0 is reserved to indicate call states that are local to the site in question (call states not replicated from another site).
- Geo Remote T3 URL: For primary sites (or for secondary sites that replicate their own data to another site), enter the T3 URL or URLs of the engine tier servers that will replicate this site's call state data. If the secondary engine tier cluster uses a cluster address, you can enter a single T3 URL, such as t3://mycluster:7001. If the secondary engine tier cluster does not use a cluster address, enter the URLs for each individual engine tier server separated by a comma, such as t3://engine1-east-coast:7001,t3://engine2-east-coast:7002,t3://engine3-east-coast:7001,t4://engine4-east-coast:7002.
Click Save to save your configuration changes.
Click Activate Changes to apply your changes to the engine tier servers.

6.6.5.3 Configuring JMS Resources (Secondary Site Only)

Any site that replicates call state data from another site must configure certain required JMS resources. The resources are not required for sites that do not replicate data from another site.

Follow these steps to configure JMS resources:

Use your browser to access the URL http://address:port/console where address is the Administration Server's listen address and port is the listen port.
Click Lock & Edit to obtain a configuration lock.
Select the Services > Messaging > JMS Servers tab in the left pane.
Click New in the right pane.
Enter a unique name for the JMS Server or accept the default name. Click Next to continue.
In the Target list, select the name of a single engine tier server node in the installation. Click Finish to create the new Server.
Repeat Steps 3-6 for to create a dedicated JMS Server for each engine tier server node in your installation.
Select the Services > Messaging > JMS Modules node in the left pane.
Click New in the right pane.
Fill in the fields of the Create JMS System Module page as follows:
- Name: Enter a name for the new module, or accept the default name.
- Descriptor File Name: Enter the prefix a configuration file name in which to store the JMS module configuration (for example, systemmodule-callstate).
Click Next to continue.
Select the name of the engine tier cluster, and choose the option All servers in the cluster.
Click Next to continue.
Select Would you like to add resources to this JMS system module and click Finish to create the module.
Click New to add a new resource to the module.
Select the Connection Factory option and click Next.
Fill in the fields of the Create a new JMS System Module Resource as follows:
- Name: Enter a descriptive name for the resource, such as ConnectionFactory-Callstate.
- JNDI Name: Enter the name wlss.callstate.backup.site.connection.factory.
Click Next to continue.
Click Finish to save the new resource.
Select the name of the connection factory resource you just created.
Select the Configuration > Load Balance tab in the right pane.
De-select the Server Affinity Enabled option, and click Save.
Re-select the Services > Messaging > JMS Modules node in the left pane.
Select the name of the JMS module you created in the right pane.
Click New to create another JMS resource.
Select the Distributed Queue option and click Next.
Fill in the Name field of the Create a new JMS System Module Resource by entering a descriptive name for the resource, such as DistributedQueue-Callstate.
JNDI Name: Enter the name Fill in the fields of the Create a new JMS System Module Resource as follows:
- Name: Enter a descriptive name for the resource, such as ConnectionFactory-Callstate.
- JNDI Name: Enter the name wlss.callstate.backup.site.queue.
Click Next to continue.
Click Finish to save the new resource.
Click Save to save your configuration changes.
Click Activate Changes to apply your changes to the engine tier servers.

6.6.6 Understanding Geo-Redundant Replication Behavior

This section provides more detail into how multiple sites replicate call state data. Administrators can use this information to better understand the mechanics of geo-redundant replication and to better troubleshoot any problems that may occur in such a configuration. Note, however, that the internal workings of replication across Oracle WebLogic Communication Services installations is subject to change in future releases of the product.

6.6.6.1 Call State Replication Process

When a call is initiated on a primary Oracle WebLogic Communication Services site, call setup and processing occurs normally. When a SIP dialog boundary is reached, the call is replicated (in-memory) to the site's SIP data tier, and becomes eligible for replication to a secondary site. Oracle WebLogic Communication Services may choose to aggregate multiple call states for replication in order to optimize network usage.

A single replica in the SIP data tier then places the call state data to be replicated on a JMS queue configured on the replica site. Data is transmitted to one of the available engines (specified in the geo-remote-t3-url element in sipserver.xml) in a round-robin fashion. Engines at the secondary site monitor their local queue for new messages.

Upon receiving a message, an engine on the secondary site persists the call state data and assigns it the site ID value of the primary site. The site ID distinguishes replicated call state data on the secondary site from any other call state data actively managed by the secondary site. Timers in replicated call state data remain dormant on the secondary site, so that timer processing does not become a bottleneck to performance.

6.6.6.2 Call State Processing After Failover

To perform a failover, the Administrator must change a global load balancer policy to begin routing calls from the primary, failed site to the secondary site. After this process is completed, the secondary site begins processing requests for the backed-up call state data. When a request is made for data that has been replicated from the failed site, the engine retrieves the data and activates the call state, taking ownership for the call. The activation process involves:

Setting the site ID associated with the call to zero (making it appear local).
Activating all dormant timers present in the call state.

By default, call states are activated only for individual calls, and only after those calls are requested on the backup site. SipServerRuntimeMBean includes a method, activateBackup(byte site), that can be used to force a site to take over all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script. Alternatively, an application deployed on the server can detect when a request for replicated site data occurs, and then execute the method. Example 6-7 shows sample code from a JSP that activates a secondary site, changing ownership of all call state data replicated from site 1. Similar code could be used within a deployed Servlet. Note that either a JSP or Servlet must run as a privileged user in order to execute the activateBackup method.

In order to detect whether a particular call state request, Servlets can use the WlssSipApplicationSession.getGeoSiteId() method to examine the site ID associated with a call. Any non-zero value for the site ID indicates that the Servlet is working with call state data that was replicated from another site.

Example 6-7 Activating a Secondary Site Using JMX

<%
    byte site = 1;

    InitialContext ctx = new InitialContext();
    MBeanServer server = (MBeanServer) ctx.lookup("java:comp/env/jmx/runtime");
    Set set = server.queryMBeans(new ObjectName("*:*,Type=SipServerRuntime"), null);
    if (set.size() == 0) {
      throw new IllegalStateException("No MBeans Found!!!");
    }

    ObjectInstance oi = (ObjectInstance) set.iterator().next();
    SipServerRuntimeMBean bean = (SipServerRuntimeMBean)
      MBeanServerInvocationHandler.newProxyInstance(server,
        oi.getObjectName());

    bean.activateBackup(site);
  %>

Note that after a failover, the load balancer must route all calls having the same callId to the newly-activated site. Even if the original, failed site is restored to service, the load balancer must not partition calls between the two geographical sites.

6.6.7 Removing Backup Call States

You may also choose to stop replicating call states to a remote site in order to perform maintenance on the remote site or to change the backup site entirely. Replication can be stopped by setting the Site Handling attribute to "none" on the primary site as described in Section 6.6.5.2, "Configuring Persistence Options (Primary and Secondary Sites)".

After disabling geographical replication on the primary site, you also may want to remove backup call states on the secondary site. SipServerRuntimeMBean includes a method, deleteBackup(byte site), that can be used to force a site to remove all call state data that it has replicated from another site. The Administrator can execute this method using a WLST configuration script or via an application deployed on the secondary site. The steps for executing this method are similar to those for using the activateBackup method, described in Section 6.6.6.2, "Call State Processing After Failover".

6.6.8 Monitoring Replication Across Regional Sites

The ReplicaRuntimeMBean includes two new methods to retrieve data about geographically-redundant replication:

getBackupStoreOutboundStatistics() provides information about the number of calls queued to a secondary site's JMS queue.
getBackupStoreInboundStatistics() provides information about the call state data that a secondary site replicates from another site.

See Oracle Fusion Middleware Communication Services Java API Reference for more information about ReplicaRuntimeMBean.

6.6.9 Troubleshooting Geographical Replication

In addition to using the ReplicaRuntimeMBean methods described in Section 6.6.8, "Monitoring Replication Across Regional Sites", Administrators should monitor any SNMP traps that indicate failed database writes on a secondary site installation.

Administrators must also ensure that all sites participating in geographically-redundant configurations use unique site IDs.

6.7 Caching SIP Data in the Engine Tier

As described in Chapter 15, "Oracle WebLogic Communication Services Base Platform Topologies", in the default Oracle WebLogic Communication Services configuration the engine tier cluster is stateless. A separate SIP data tier cluster manages call state data in one or more partitions, and engine tier servers fetch and write data in the SIP data tier as necessary. Engines can write call state data to multiple replicas in each partition to provide automatic failover should a SIP data tier replica going offline.

Oracle WebLogic Communication Services also provides the option for engine tier servers to cache a portion of the call state data locally, as well as in the SIP data tier. When a local cache is used, an engine tier server first checks its local cache for existing call state data. If the cache contains the required data, and the local copy of the data is up-to-date (compared to the SIP data tier copy), the engine locks the call state in the SIP data tier but reads directly from its cache. This improves response time performance for the request, because the engine does not have to retrieve the call state data from a SIP data tier server.

The engine tier cache stores only the call state data that has been most recently used by engine tier servers. Call state data is moved into an engine's local cache as necessary in order to respond to client requests or to refresh out-of-date data. If the cache is full when a new call state must be written to the cache, the least-recently accessed call state entry is first removed from the cache. The size of the engine tier cache is not configurable.

Using a local cache is most beneficial when a SIP-aware load balancer manages requests to the engine tier cluster. With a SIP-aware load balancer, all of the requests for an established call are directed to the same engine tier server, which improves the effectiveness of the cache. If you do not use a SIP-aware load balancer, the effectiveness of the cache is limited, because subsequent requests for the same call may be distributed to different engine tier severs (having different cache contents).

6.7.1 Configuring Engine Tier Caching

Engine tier caching is enabled by default. To disable partial caching of call state data in the engine tier, specify the engine-call-state-cache-enabled element in sipserver.xml:

<engine-call-state-cache-enabled>false</engine-call-state-cache-enabled>

When enabled, the cache size is fixed at a maximum of 250 call states. The size of the engine tier cache is not configurable.

6.7.2 Monitoring and Tuning Cache Performance

SipPerformanceRuntime monitors the behavior of the engine tier cache. Table 6-3 describes the MBean attributes.

Table 6-3 SipPerformanceRuntime Attribute Summary

Attribute	Description
cacheRequests	Tracks the total number of requests for session data items.
cacheHits	The server increments this attribute each time a request for session data results in a version of that data being found in the engine tier server's local cache. Note that this counter is incremented even if the cached data is out-of-date and needs to be updated with data from the SIP data tier.
cacheValidHits	This attribute is incremented each time a request for session data is fully satisfied by a cached version of the data.

When enabled, the size of the cache is fixed at 250 call states. Because the cache consumes memory, you may need to modify the JVM settings used to run engine tier servers to meet your performance goals. Cached call states are maintained in the tenured store of the garbage collector. Try reducing the fixed "NewSize" value when the cache is enabled (for example, -XX:MaxNewSize=32m -XX:NewSize=32m). Note that the actual value depends on the call state size used by applications, as well as the size of the applications themselves.

6.8 Monitoring and Troubleshooting SIP Data Tier Servers

A runtime MBean, (ReplicaRuntimeMBean), provides valuable information about the current state and configuration of the SIP data tier. See Oracle Fusion Middleware Communication Services Java API Reference for a description of the attributes provided in this MBean.

Many of these attributes can be viewed using the SIP Servers Monitoring > Data Tier Information tab in the Administration Console, as shown in Figure 6-6.

Figure 6-6 SIP Data Tier Monitoring in the Administration Console

Description of "Figure 6-6 SIP Data Tier Monitoring in the Administration Console"

Example 6-8 shows a simple WLST session that queries the current attributes of a single Managed Server instance in a SIP data tier partition. Table 6-1 describes the MBean services in more detail.

Example 6-8 Displaying ReplicaRuntimeMBean Attributes

connect('weblogic','weblogic','t3://datahost1:7001')
custom()
cd('com.bea')
cd('com.bea:ServerRuntime=replica1,Name=replica1,Type=ReplicaRuntime')
ls()
-rw-   BackupStoreInboundStatistics                 null
-rw-   BackupStoreOutboundStatistics                null
-rw-   BytesReceived                                0
-rw-   BytesSent                                    0
-rw-   CurrentViewId                                2
-rw-   DataItemCount                                0
-rw-   DataItemsToRecover                           0
-rw-   DatabaseStoreStatistics                      null
-rw-   HighKeyCount                                 0
-rw-   HighTotalBytes                               0
-rw-   KeyCount                                     0
-rw-   Name                                         replica1
-rw-   Parent                                       com.bea:Name=replica1,Type=S
erverRuntime
-rw-   PartitionId                                  0
-rw-   PartitionName                                part-1
-rw-   ReplicaId                                    0
-rw-   ReplicaName                                  replica1
-rw-   ReplicaServersInCurrentView                  java.lang.String[replica1, replica2]
-rw-   ReplicasInCurrentView                        [I@75378c
-rw-   State                                        ONLINE
-rw-   TimerQueueSize                               0
-rw-   TotalBytes                                   0
-rw-   Type                                         ReplicaRuntime

Table 6-4 ReplicaRuntimeMBean Method and Attribute Summary

Method/Attribute	Description
dumpState()	Records the entire state of the selected SIP data tier server instance to the Oracle WebLogic Communication Services log file. You may want to use the `dumpState()` method to provide additional diagnostic information to a Technical Support representative in the event of a problem.
BackupStoreInboundStatistics	Provides statistics about call state data replicated from a remote geographical site.
BackupStoreOutboundStatistics	Provides statistics about call state data replicated to a remote geographical site.
BytesReceived	The total number of bytes received by this SIP data tier server. Bytes are received as servers in the engine tier provide call state data to be stored.
BytesSent	The total number of bytes sent from this SIP data tier server. Bytes are sent to engine tier servers when requested to provide the stored call state.
CurrentViewId	The current view ID. Each time the layout of the SIP data tier changes, the view ID is incremented. For example, as multiple servers in a SIP data tier cluster are started for the first time, the view ID is incremented when each server begins participating in the SIP data tier. Similarly, the view is incremented if a server is removed from the SIP data tier, either intentionally or due to a failure.
DataItemCount	The total number of stored call state keys for which this server has data. This attribute may be lower than the `KeyCount` attribute if the server is currently recovering data.
DataItemsToRecover	The total number of call state keys that must still be recovered from other replicas in the partition. A SIP data tier server may recover keys when it has been taken offline for maintenance and is then restarted to join the partition.
HighKeyCount	The highest total number of call state keys that have been managed by this server since the server was started.
HighTotalBytes	The highest total number of bytes occupied by call state data that this server has managed since the server was started.
KeyCount	The number of call data keys that are stored on the replica.
PartitionId	The numerical partition ID (from 0 to 7) of this server's partition.
PartitionName	The name of this server's partition.
ReplicaId	The numerical replica ID (from 0 to 2) of this server's replica.
ReplicaName	The name of this server's replica.
ReplicaServersInCurrentView	The names of other Oracle WebLogic Communication Services instances that are participating in the partition.
State	The current state of the replica. SIP data tier servers can have one of three different statuses: `ONLINE`—indicates that the server is available for managing call state transactions. `OFFLINE`—indicates that the server is shut down or unavailable. `ONLINE_LOCK_AUTHORITY_ONLY`—indicates that the server was rebooted and is currently being updated (from other replicas) with the current call state data. A recovering server cannot yet process call state transactions, because it does not maintain a full copy of the call state managed by the partition.
TimerQueueSize	The current number of timers queued on the SIP data tier server. This generally corresponds to the KeyCount value, but may be less if new call states are being added but their associated timers have not yet been queued. Note: Engine tier servers periodically check with SIP data tier instances to determine if timers associated with a call have expired. In order for SIP timers to function properly, all engine tier servers must actively synchronize their system clocks to a common time source. Oracle recommends using a Network Time Protocol (NTP) client or daemon on each engine tier instance and synchronizing to a selected NTP server. See Section 3.5, "Configuring Timer Processing".
TotalBytes	The total number of bytes consumed by the call state managed in this server.