16 Deployment Topologies for Communication Services

This chapter describes the OWLCS deployment topologies in the following sections:

Section 16.1, "Terminology"
Section 16.2, "OWLCS Deployment Topologies"
Section 16.3, "Oracle WebLogic Communication Services Enterprise Deployment Topology"

16.1 Terminology

The following terms are used in this chapter:

Administration Server--a special server instance within a domain to configure and manage all resources in the domain. There is just one Administration Server in a domain.
Domain--a logically related group of server resources
High Availability (HA)--refers to a system or component that is continuously operational for a desirable long length of time
JVM--Java Virtual Machine
Managed Server--a server instance within a domain where resources are deployed. There can be one or more managed servers within a domain.
Node--a physical machine
OWLCS Client--a client application that can connect to OWLCS server and use it's features -Presence, Instant Messaging (IM), VoIP, Presence Web Service, and Messaging Web Service. For example: Oracle Communicator.

16.2 OWLCS Deployment Topologies

OWLCS supports single-node and multi-node deployment topologies:

All-In-One Managed Server--All OWLCS components are deployed on a Managed Server within a domain on a physical machine. An Administration Server installed on the same domain is used to configure and manage OWLCS components.

Figure 16-1 All-in-One Managed Server

Description of "Figure 16-1 All-in-One Managed Server"

See Section 16.2.1, "Single-node Topologies" for more information.
All-In-One Administration Server--All OWLCS components are deployed on an Administration Server within a domain on a physical machine. There are no managed servers. Configuration and management of OWLCS components is done using the same Administration Server.

Figure 16-2 All-in-One Administration Server

Description of "Figure 16-2 All-in-One Administration Server"

See Section 16.2.1, "Single-node Topologies" for more information.
Enterprise Deployment--This is the minimal recommended HA topology for Enterprise Deployment for OWLCS. OWLCS components are separated as SIP Infrastructure, Services and Presence components. Each group of components is deployed within its own domain on two different machines.

Figure 16-3 Enterprise Deployment

Description of "Figure 16-3 Enterprise Deployment"

See Section 16.3.1, "Introduction to OWLCS Enterprise Deployment Topology" for more information.

16.2.1 Single-node Topologies

Both the All-In-One Administration Server and All-In-One Managed Server Topologies are intended to provide an out-of-box experience in which Presence, Instant Messaging, Voice-Over-IP, Third Party Calling, and Presence and MessagingWeb Services work with an OWLCS client. These topologies are recommended for use in a test and evaluation environment, not in a production environment. They do not include High-Availability support.

In an All-In-One Administration Server topology, the one and the only server (the Administration Server) runs in a single JVM. In an All-In-One Managed Server topology, the Managed Server runs in one JVM and the Administration Server runs in a separate JVM.

16.3 Oracle WebLogic Communication Services Enterprise Deployment Topology

This section describes the Oracle WebLogic Communication Services (OWLCS) high-availability topology: Enterprise Deployment. The following sections are included:

Section 16.3.1, "Introduction to OWLCS Enterprise Deployment Topology"
Section 16.3.2, "Geographic Redundancy"
Section 16.3.3, "Failover"

16.3.1 Introduction to OWLCS Enterprise Deployment Topology

OWLCS supports a multi-node deployment topology. A SIP Container cluster is defined as a set of SIP Container instances that share state related to the applications. A cluster consists of multiple nodes, with each node running one instance of OWLCS.

A highly available OWLCS cluster provides the following:

Replication of objects and values contained in a SIP Application Session
Database backed location service data
Load balancing of incoming requests across OWLCS SIP nodes
Overload protection protects the server from malfunctioning in the event of overload and rejects traffic which cannot be handled properly.
Transparent failover across applications within the cluster. If an instance of an application fails, it becomes unresponsive and the session can fail over to another instance of the application, on another node in a cluster.

Note:

For more information on OWLCS as a scalable Presence deployment, see Appendix G, "Deploying a Scalable Presence Deployment"

Note:

For information on installing and configuring OWLCS Enterprise Deployment topology, see Oracle WebLogic Communication Services Installation Guide.

16.3.1.1 Runtime Processes

Oracle WebLogic Communication Services start scripts use default values for many JVM parameters that affect performance. For example, JVM garbage collection and heap size parameters may be omitted, or may use values that are acceptable only for evaluation or development purposes. In a production system, you must rigorously profile your applications with different heap size and garbage collection settings in order to realize adequate performance. See Oracle WebLogic Communication Services Administrator's Guide for suggestions about maximizing JVM performance in a production domain.

Because a typical Oracle WebLogic Communication Services domain contains numerous engine and servers, with dependencies between the different server types, you should generally follow this sequence when starting up a domain:

Start the Administration Server for the domain. Start the Administration Server in order to provide the initial configuration. The Administration Server can also be used to monitor the startup/shutdown status of each Managed Server. You generally start the Administration Server by using either the startWebLogic.sh/cmd script installed with the Configuration Wizard, or a custom startup script.
Start Managed Servers. Next you can start managed servers using the startManagedWebLogic.sh/cmd script or a custom startup script.

16.3.1.2 Request Flow

Request Flow is described in JSR 289. This specification is an enhancement to the SIPServlet specification.

For details on JSR 289, see: http://www.oracle.com/technology/tech/java/standards/jsr289/index.html

16.3.1.3 Client Connections

The default HTTP network configuration for each Oracle WebLogic Communication Services instance is determined from the Listen Address and Listen Port setting for each server. However, Oracle WebLogic Communication Services does not support the SIP protocol over HTTP. The SIP protocol is supported over the UDP and TCP transport protocols. SIPS is also supported using the TLS transport protocol.

To enable UDP, TCP, or TLS transports, you configure one or more network channels for an Oracle WebLogic Communication Services instance. A network channel is a configurable WebLogic Server resource that defines the attributes of a specific network connection to the server instance. Basic channel attributes include:

Protocols supported by the connection,
Listen address (DNS name or IP address) of the connection,
Port number used by the connection,
(optional) Port number used by outgoing UDP packets,
Public listen address (load balancer address) to embed in SIP headers when the channel is used for an outbound connection.

You can assign multiple channels to a single Oracle WebLogic Communication Services instance to support multiple protocols or to utilize multiple interfaces available with multi-homed server hardware. You cannot assign the same channel to multiple server instances.

When you configure a new network channel for the SIP protocol, both the UDP and TCP transport protocols are enabled on the specified port. You cannot create a SIP channel that supports only UDP transport or only TCP transport. When you configure a network channel for the SIPS protocol, the server uses the TLS transport protocol for the connection.

As you configure a new SIP Server domain, you will generally create multiple SIP channels for communication to each engine tier server in your system. Engine tier servers can communicate to SIP state tier replicas using the configured Listen Address attributes for the replicas. Note, however, that replicas must use unique Listen Addresses in order to communicate with one another.

16.3.1.4 Artifacts

Installation, configuration and deployment create the following artifacts for OWLCS.

16.3.1.4.1 .ear Files

.ear files for deploying your applications are found in your middleware home directory (for example): $MW_HOME/as11gr1wlcs1/communications/applications. They are typically easily-identified by their names (for example: sdpmessagingdriver-smpp.ear for deploying SMPP driver).

16.3.1.4.2 Configuration Files

Configuring components is accomplished through various .xml files. They can be found in your middleware home directory (for example): $DOMAIN_HOME/config/communications. They are typically easily-identified by their names (for example: usermessagingdriver-smpp_Plan.xml for configuring SMPP).

16.3.1.4.3 Log Files

Log activities are stored in log files. There are many log files, but of special note are install, activity, error, and diagnostic logs. They are found in your middleware home directory (for example): $DOMAIN_HOME/servers/wlcs_server1/logs.

16.3.1.5 Topology Components

Components of a highly available OWLCS topology are detailed below. Figure 16-4 shows details of the OWLCS hosts that support the topology.

Figure 16-4 Host details for OWLCS

Description of "Figure 16-4 Host details for OWLCS"

Section 16.3.1.5.1, "SIP Containers"
Section 16.3.1.5.2, "Third-Party Load Balancer"
Section 16.3.1.5.3, "Proxy Registrar"
Section 16.3.1.5.4, "Presence Services"
Section 16.3.1.5.5, "Presence Web Services"
Section 16.3.1.5.6, "Third-Party Call Control Web Services"
Section 16.3.1.5.7, "Aggregation Proxy"
Section 16.3.1.5.8, "Authentication Proxy"
Section 16.3.1.5.9, "File Transfer Service"
Section 16.3.1.5.10, "Messaging Services"
Section 16.3.1.5.11, "STUN Service"
Section 16.3.1.5.12, "DAR Configuration"
Section 16.3.1.5.13, "Oracle Communicator Client"
Section 16.3.1.5.14, "State Tier"
Section 16.3.1.5.15, "User Dispatcher"
Section 16.3.1.5.16, "Oracle RAC Database"

Note:

For more information on configuring these components, see Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter.

Figure 16-5 OWLCS High Availability detail

Description of "Figure 16-5 OWLCS High Availability detail"

16.3.1.5.1 SIP Containers

OWLCS extends the core WebLogic Server platform with a SIP Container compliant with JSR 289. This enables the development of J2EE applications that processes SIP in addition to HTTP for any advanced communications application. The platform enables the development of complementary communications services that integrate with SIP-based IP-PBXs as well as other SIP elements such as standard SIP clients. For more information on SIP Container, see Oracle WebLogic Communication Services Administrator's Guide.

Two or more SIP Containers are linked to one another through OWLCS SIP state replication. The OWLCS SIP state on each computer is replicated to other nodes in the SIP State Tier so that if one SIP Container node fails, another container node takes over, using the replicated state of the failed node.

16.3.1.5.2 Third-Party Load Balancer

A third-party load balancer balances the load of incoming traffic. It also redirects traffic from failed nodes to operational ones. Your Load Balancer must be capable of routing both HTTP and SIP traffic, and must be configured to do so. For more information about Load Balancers, see Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter.

16.3.1.5.3 Proxy Registrar

The OWLCS Proxy Registrar combines the functionality of a SIP Proxy Server and Registrar. Its main tasks include registering subscribers, looking up subscriber locations, and proxying requests onward. The Proxy Registrar stores user location and registration data on the Oracle database. This is an optional component. For more information on Proxy Registrar, see Section 14.1, "Proxy Registrar".

16.3.1.5.4 Presence Services

OWLCS includes a Presence Service that acts as the aggregator of presence information and provides a subscribe/notify for applications and end-users to consume presence information. An application can integrate either using web services or by using a compliant SIP-based end-user client.

16.3.1.5.5 Presence Web Services

OWLCS enables Web Service clients to access presence services through its support of the Parlay X Presence Web Service as defined in Open Service Access, Parlay X Web Services, Part 14, Presence ETSI ES 202 391-14. A Parlay X Web Service enables an HTTP Web Service client to access such presence services as publishing and subscribing to presence information. The Parlay X Presence Web Service does not require developers to be familiar with the SIP protocol to build such a Web-based client; instead, Parlay X enables Web developers to build this client using their knowledge of Web Services.

16.3.1.5.6 Third-Party Call Control Web Services

The Third Party Call Parlay X 2.1 communication services implement the Parlay X 2.1 Third Party Call interface, (Standards reference: ETSI ES 202 391-2 V1.2.1 (2006-12), Open Service Access (OSA); Parlay X Web Services; Part 2: Third Party Call (Parlay X 2). For more information on Third-Party Call Control Web Services, see Oracle WebLogic Communication Services Developer's Guide.

16.3.1.5.7 Aggregation Proxy

The Aggregation Proxy authorizes web service calls and authenticates XCAP traffic. The Aggregation Proxy then proxies this traffic to the Parlay X Web Service and XDMS. This is an optional component.

16.3.1.5.8 Authentication Proxy

The Authentication Proxy is a SIP application that upon successful authentication populates a P-Asserted-Identity header into the request. The authentication itself is not performed by the Authentication Proxy application. SIP digest-based authentication is configured in a standard way, as for all SIP applications. The P-Asserted-Identity header value will be the externally stored (for example, in Oracle Identity Management or Database) preferred SIP- and/or TEL-URL for the authenticated user.

16.3.1.5.9 File Transfer Service

OWLCS includes an Oracle-proprietary file transfer service enabling users to transfer files to one-another.

16.3.1.5.10 Messaging Services

Messaging Services implements support for a subset of the operations in the SendMessage, ReceiveMessage, and MessageNotificationManager interfaces, as they are defined in ETSI ES 202 391-5 V1.2.1 (2006-12), Open Service Access (OSA), Parlay X Web Services Part 5: Multimedia Messaging (Parlay X 2).

16.3.1.5.11 STUN Service

The OWLCS STUN Service implements STUN (Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)). For more information on STUN Service, see Section 14.2, "STUN Service".

16.3.1.5.12 DAR Configuration

DAR Configuration. Application Router is a SIP application that routes incoming SIP requests to the correct application. The Application Router routes requests by placing route headers in each SIP request it processes. A number of route headers can be placed in a request, each representing a different destination URI. The SIP request is either sent through the chain of destination URIs, or proxied to a new URI upon arriving at its first destination.

16.3.1.5.13 Oracle Communicator Client

Oracle Communicator is a client communication application that enables users to keep in touch with others in their organization. You can see your Contacts' presence (that is, their availability), and communicate with them by sending instant messages and sharing files.

16.3.1.5.14 State Tier

The Oracle WebLogic Communication Services SIP state tier node manages the application call state for concurrent SIP calls. The SIP state tier may manage a single copy of the call state or multiple copies across the cluster as needed to ensure that call state data is not lost if a server machine fails or network connections are interrupted.

16.3.1.5.15 User Dispatcher

The User Dispatcher enables the Presence and XDMS applications to scale. The User Dispatcher is a proxy that dispatches SIP and XCAP (over HTTP) requests to their appropriate destinations on a consistent basis.

Because the Presence application maintains the states for all users in a deployment, User Dispatcher enables scaling (distribution) of the Presence application. The User Dispatcher supports request dispatching to the Presence Server and XDMS sub-applications, which use the SIP and XCAP (over HTTP) protocols.

16.3.1.5.16 Oracle RAC Database

Oracle RDBMS should be installed and operational in an Oracle RAC environment. The supported versions are 10.2.0.4 and 11.1.0.7. Refer to Oracle Clusterware Installation Guide 11g Release 1 (11.1.), Oracle Real Application Clusters: For 11g Release 1 (11.1) and Oracle Real Application Clusters Installation Guide 11g Release 1.

16.3.1.6 Overview of SIP State Tier Configuration

The Oracle WebLogic Communication Services SIP state tier is a cluster of server instances that manages the application call state for concurrent SIP calls. The SIP state tier may manage a single copy of the call state or multiple copies as needed to ensure that call state data is not lost if a server machine fails or network connections are interrupted.

The SIP state tier cluster is arranged into one or more partitions. A partition consists of one or more SIP state tier server instances that manage the same portion of concurrent call state data. In a single-server Oracle WebLogic Communication Services installation, or in a two-server installation where one server resides in the engine tier and one resides in the SIP state tier, all call state data is maintained in a single partition. Multiple partitions are required when the size of the concurrent call state exceeds the maximum size that can be managed by a single server instance. When more than one partition is used, the concurrent call state is split among the partitions, and each partition manages an separate portion of the data. For example, with a two-partition SIP state tier, one partition manages the call state for half of the concurrent calls (for example, calls A through M) while the second partition manages the remaining calls (N through Z).

In most cases, the maximum call state size that can be managed by an individual server is limited by the heap size in the Java Virtual Machine.

Additional servers can be added within the same partition to manage copies of the call state data. When multiple servers are members of the same partition, each server manages a copy of the same portion of the call data, referred to as a replica of the call state. If a server in a partition fails or cannot be contacted due to a network failure, another replica in the partition supplies the call state data to the engine tier. Oracle recommends configuring two servers in each partition for production installations, to guard against machine or network failures. A partition can have a maximum of three replicas for providing additional redundancy.

16.3.1.7 Example SIP State Tier Configurations and Configuration Files

The sections that follow describe some common Oracle WebLogic Communication Services installations that utilize a separate SIP state tier.

16.3.1.7.1 SIP State Tier with One Partition

A single-partition, two server SIP state tier represents the simplest state tier configuration.

For example, the datatier.xml configuration file shown in Example 16-1 creates a two-replica configuration.

Example 16-1 SIP State Tier Configuration for Small Deployment with Replication

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http:....">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
      <server-name>DataNode0-1</server-name>
    </partition>
  </data-tier>

16.3.1.7.2 SIP State Tier with Two Partitions

Multiple partitions can be created by defining multiple partition entries in datatier.xml, as shown in Example 16-2.

Example 16-2 Two-Partition SIP State Tier Configuration

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http:....">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
    </partition>
    <partition>
      <name>Partition1</name>
      <server-name>DataNode1-0</server-name>
    </partition>
  </data-tier>

16.3.1.7.3 SIP State Tier with Two Partitions and Two Replicas

Replicas of the call state can be added by defining multiple SIP state tier servers in each partition. Example 16-3 shows the datatier.xml configuration file used to define a system having two partitions with two servers (replicas) in each partition.

Example 16-3 SIP State Tier Configuration for Small Deployment

<?xml version="1.0" encoding="UTF-8"?>
  <data-tier xmlns="http:....">
    <partition>
      <name>Partition0</name>
      <server-name>DataNode0-0</server-name>
      <server-name>DataNode0-1</server-name>
    </partition>
    <partition>
      <name>Partition1</name>
      <server-name>DataNode1-0</server-name>
      <server-name>DataNode1-1</server-name>
    </partition>
  </data-tier>

16.3.1.8 Storing Long-Lived Call State Data in an RDBMS

Oracle WebLogic Communication Services enables you to store long-lived call state data in an Oracle RDBMS in order to conserve RAM. When you enable RDBMS persistence, by default the SIP state tier persists a call state's data to the RDBMS after the call dialog has been established, and at subsequent dialog boundaries, retrieving or deleting the persisted call state data as necessary to modify or remove the call state.

Oracle also provides an API for application designers to provide hints as to when the SIP state tier should persist call state data. These hints can be used to persist call state data to the RDBMS more frequently, or to disable persistence for certain calls.

Oracle WebLogic Communication Services can use the RDBMS to supplement the SIP state tier's in-memory replication functionality. To improve latency performance when using an RDBMS, the SIP state tier maintains SIP timers in memory, along with call states being actively modified (for example, in response to a new call being set up). Call states are automatically persisted only after a dialog has been established and a call is in progress, at subsequent dialog boundaries, or in response to persistence hints added by the application developer.

When used in conjunction with an RDBMS, the SIP state tier (the term state tier is used interchangeably with data tier in this document) selects one replica server instance to process all call state writes (or deletes) to the database. Any available replica can be used to retrieve call states from the persistent store as necessary for subsequent reads.

RDBMS call state storage can be used in combination with an engine tier cache, if your domain uses a SIP-aware load balancer to manage connections to the engine tier.

16.3.2 Geographic Redundancy

Geographic Redundancy ensures uninterrupted transactions and communications for providers, using geographically-separated SIP server deployments.

A primary site can process various SIP transactions and communications and upon determining a transaction boundary, replicate the state data associated with the transaction being processed, to a secondary site. Upon failure of the primary site, calls are routed from the failed primary site to a secondary site for processing. Similarly, upon recovery, the calls are re-routed back to the primary site. See Oracle WebLogic Communication Services Administrator's Guide for more information

Figure 16-6 Geo-Redundancy

Description of "Figure 16-6 Geo-Redundancy"

In the preceding figure, Geo-Redundancy is portrayed. The process proceeds in this manner:

Call is initiated on a primary Cluster site, call setup and processing occurs normally.
Call is replicated as usual to the site's SIP State Tier, and becomes eligible for replication to a secondary site.
A single replica in the SIP State Tier then places the call state data to be replicated on a JMS queue configured on a replica site.
Call is transmitted to one of the available engines using JMS over WAN.
Engines at the secondary site monitor their local queue for new messages. Upon receiving a message, an Engine in the secondary site persists the call state data and assigns it the site ID value of the primary site.

16.3.3 Failover

Oracle's High Availability solutions include failover support for all levels of your system. During failover, the functions of a component that is malfunctioning or non-operational are addressed by a standby or replacement component. This is done without manual intervention, and in most cases end users will not be able to detect that the system is taking failover actions. There are two main types of failover: Session Failover and Service Failover.

Session Failover occurs when the session, connection, or power are interrupted. During Session Failover, ongoing calls or requests are handled by backup nodes, and users do not detect that the condition arose. The SIP Container provides services to upper-levels to help recover from session failure. The SIP protocol rebuilds state at certain pre-defined times (for example, every hour). This is designed to protect unreliable networks.

When Service Failover occurs (such as when a node in a cluster of nodes goes down), the load is handled by other nodes in the cluster. If an individual request is being processed at the time, it will fail, but subsequent requests will be picked up by the functioning nodes.

In its High Availability configuration, services running in the wlcs-services domain such as Proxy Registrar, Third-Party Call Control, and other distributable applications support Session Failover, while services running in the wlcs_presence domain such as Presence and other non-distributable applications support Service Failover.

16.3.3.1 OWLCS Presence Failover

OWLCS Presence, when multiple nodes are used in a High Availability deployment, distributes its services across nodes (servers). When a request for the Presence information for an entity (E1) is made, the request goes to node 1 (N1). If N1 goes down during the request, the user will see a failure. Upon resubmitting the request, it (the request) will be handled by node 2 (N2). No failure occurs during the re-request, or during any other requests that are made after the initial failure.

Fail-over is a technique that can be used by the User Dispatcher to assert a higher level of availability of the Presence Server. Since the Presence server does not replicate any state (such as established subscriptions) the state has to be recreated by the clients on the new server node by setting up new subscriptions. Also, since a subscription is a SIP dialog and the User Dispatcher is not record routing, it cannot fail-over a subscription from one node to another. All subsequent requests will follow the route set and end up on the “old” node.

This is not a problem when failing over from a “failing” server since that node is not processing the traffic anyway and any request within a dialog will eventually get a fail response or timeout and the dialog will be terminated. However, when migrating back a user from the backup node to the original node (when it has been repaired), which has to be done to maintain an even distribution after the failure, this is a problem that can lead to broken presence functionality. The only way to migrate a subscription from one running server to another is to either restart the client or the server.

However, the server that holds the subscription can actively terminate it by sending out a terminating NOTIFY and discarding the subscription state. This will force the client to issue a new initial SUBSCRIBE to establish a new dialog. For a subscription to migrate from one live node to another the User Dispatcher must fail-over the traffic (which is only affecting initial requests) and instruct the current server to terminate the subscriptions.

16.3.3.2 Presentity Migration

Presentities must be migrated when the set of nodes have changed. This involves having the Presence application to terminate some or all subscriptions to make the migration happen.

16.3.3.2.1 Stateless User Dispatcher and Even Distribution

The most basic approach is to contact the Presence application on all nodes to terminate all its subscriptions. The problem with this is that a burst of traffic will be generated although spread out over a period of time. This time period results in incorrect presence states since the longer the termination period is the longer it will take until all users get a correct presence state.

To optimize this you could terminate only those subscriptions that actually need to be terminated (the ones that has been migrated). The problem is that the User Dispatcher does not know which users these are (since it does stateless distribution based on an algorithm) and the Presence application does not either (since it only knows what users it has). However, if the Presence application could iterate over all its subscriptions and for each of them ask the User Dispatcher if this user would go to this Presence node, then the Presence server could terminate only those that will not come back to itself. This may be a heavy operation, but under the constraint that each Presence server is collocated with a User Dispatcher each such callback would be within the same JVM.

16.3.3.2.2 Presence Application Broadcast

Another solution is to have the Presence servers guarantee that a user only exists on one Presence node at any given time. This can be done by having the Presence application broadcast a message to all its neighbors when it receives a PUBLISH or SUBSCRIBE for a new presentity (a presentity that it does not already have a state for). If any other Presence node that receives this broadcast message already has active subscriptions for this presentity, that server must terminate that subscription so that the client can establish a new subscription with the new server.

With this functionality in the Presence application, the User Dispatcher would not have to perform additional steps to migrate a user from one live node to another.

16.3.3.3 Standby Server Pool

Another approach is to have a standby pool of servers that are idling ready to take over traffic from a failing node. When an active node fails the User Dispatcher will redistribute all its traffic to one server from the standby pool. This node will now become active and when the failing node eventually is repaired it will be added to the standby pool. This will eliminate the need for migrating users “back” from a live node when a failing node resumes.

This approach requires more hardware and the utilization of hardware resources will not be optimal.

16.3.3.4 Failure Types

There are several types of failures that can occur in a Presence server and different types of failures may require different actions from the User Dispatcher.

16.3.3.4.1 Fatal Failures

If the failure is fatal all state information is lost and established sessions will fail. However, depending on the failure response, subscriptions (presence subscribe sessions) can survive using a new SIP dialog. If the response code is a 481 the presence client must according to RFC 3265 establish a new SUBSCRIBE dialog and this is not considered to be a failure from a presence perspective. All other failure responses may (depending on the client implementation) be handled as an error by the client and should therefore be considered a failure.

After a fatal failure the server does not have any dialog states from the time before the failure, which means that all subsequent requests that arrive at this point will receive a 481 response back. During the failure period all transactions (both initial and subsequent) will be terminated with a non-481 error code, most likely a 500 or an internal 503 or 408 (depending on if there is a proxy in the route path or not, and what the nature of the failure is).

Typically a fatal failure will result in the server process or the entire machine being restarted.

16.3.3.4.2 Temporary Failures

A temporary failure is one where none or little data is lost so that after the failure session states will remain in the server. This means that a subsequent request that arrives after the server has recovered from the failure will be processed with the same result, as it would have been before the failure.

All requests that arrive during the failure period will be responded with a non-481 failure response, such as 503.

In general a temporary failure has a shorter duration, and a typical example is an overload situation in which case the server will respond 503 on some or all requests.

16.3.3.5 Failover Actions

The User Dispatcher can take several actions when it has detected a failure in a Presence server node. The goal with the action is to minimize the impact of the failure in terms of number of failed subscriptions and publications and the time it takes to recover. In addition to this the User Dispatcher needs to keep the distribution as even as possible over the active servers.

The fail-over action to be used in this version of the User Dispatcher is to disable the node in the pool. This approach is better than removing the node because when the ResizableBucketServerPool is used since the add and remove operations are not deterministic. This means that the result of adding a node depends on the sequence of earlier add and delete operations, whether as the disable operation will always result in the same change in distribution given the set of active and disabled nodes.

16.3.3.6 Overload Policy

An activated overload policy can indicate several types of failures but its main purpose is to protect from a traffic load that is to big for the system to handle. If such a situation is detected as a failure, fail-over actions can lead to bringing down the whole cluster since if the distribution of traffic is fairly even all the nodes will be in or near an overloaded situation. If the dispatchers remove one node from the cluster and redistribute that node's traffic over the remaining nodes they will certainly enter an overload situation that causes a chain reaction.

Since it is difficult to distinguish this overload situation from a software failure that triggers the overload policy to be activated even though the system is not under load, it might still be better to take the fail-over action unless Overload Policy is disabled. If the system is really in an overload situation it is probably under dimensioned and then the fail-over should be disabled.

The User Dispatcher will not fail over when it has detected a 503 response (which indicates overload policy activated). However, if a server is in the highest overload policy state where it drops messages instead of responding 503 the User Dispatcher monitor will receive an internal 408, which can never be distinguished from a dead server and failover will occur.

16.3.3.7 Synchronization of Failover Events

Depending on the failure detection mechanism there may be a need to synchronize the fail-over events (or the resulting state) between the different dispatcher instances. This is required if the detection mechanism is not guaranteed to be consistent across the cluster, such as an Error Response. For instance one server node sends a 503 response on one request but after that works just fine (this can be due to a glitch in the overload policy). If there was only one 503 sent then only one dispatcher instance will receive it and if that event triggers a fail-over then that dispatcher instance will be out of sync with the rest of the cluster. Further, even if the grace period is implemented so that it takes several 503 responses over a time period to trigger the fail-over there is still a risk for a race condition if the failure duration is the same as the grace period.

The following methods can be used to assure that the state after fail-over is synchronized across the cluster of dispatcher instances:

16.3.3.7.1 Broadcasting Fail-Over Events

In this approach each dispatcher instance have to send a notification to all other instances (typically using JGroups or some other multicast technique) when it has decided to take a fail-over action and change the set of servers. This method can still lead to race conditions since two instances may fail-over and send a notification at the same time for two different server nodes.

16.3.3.7.2 Shared State

If all dispatcher nodes in the cluster share the same state from a “single source of truth” then when the state is changed (due to a fail-over action) by any instance all other instances will se the change.

16.3.3.8 Expanding the Cluster

Since the Presence application can generate an exponentially increasing load due to the fact that every user subscribes to multiple (potentially a growing number of) other users, there is a need for a way to dynamically expand the cluster without too much disturbance. Compared to for instance a classic telecom application where it may be acceptable to bring all servers down for an upgrade of the cluster during low traffic hours, a Presence system may have higher availability requirements than that.

Expanding the cluster may involve both adding Presence nodes and User Dispatcher nodes.

When a new Presence server is added to a cluster, some presentites must be migrated from old nodes to the new node in order to keep a fairly even distribution. This migration needs to be minimized to avoid a too big flood of traffic on the system upon changing the cluster.

When a new User Dispatcher is added to the cluster that User Dispatcher node must achieve the same dispatching state as the other dispatcher nodes. This may depending on the pool implementation require a state being synchronized with the other dispatcher nodes (for instance when using the bucket pool implementation with persistence).

Note:

Each User Dispatcher within the Presence Cluster must be configured to include all the Presence Server instances in the cluster in its list of presence servers to which they will dispatch.

16.3.3.8.1 Updating the Node Set

Depending on the algorithm used to find the server node for a given presentity, different number of presentity will be “migrated” to another node when a new node is added or removed. An optimal Pool implementation will minimize this number.

16.3.3.8.2 Migrating Presentities

When the node set has been updated some Presentites may have to be migrated to maintain an even distribution. The different ways to do this are described in "Presentity Migration".

16.3.3.9 Failover Use Cases

These use cases illustrates how the User Dispatcher reacts in different failure situations in one or several Presence server nodes.

16.3.3.9.1 One Presence Server Overloaded for 60 Seconds

The cluster consists of four Presence servers, each node consisting of one OWLCS instance with a User Dispatcher and a Presence application deployed. 100,000 users are distributed over the four servers evenly (25,000 on each node). Due to an abnormally long GC pause on one of the servers, the processing of messages is blocked by the Garbage Collector, which leads to the SIP queues getting filled up and the overload policy is activated. 60s later the processing resumes and the server continues to process messages.

The User Dispatcher will not do any fail-over but keep sending traffic to the failing node. In this case no sessions will be migrated to another node since all PUBLISH and initial SUBSCRIBE requests will be sent to the failing node. The initial SUBSCRIBES that arrives during the failure period will fail with a non-481 error (likely 503). It is up to the client to try and setup a new subscription when the failing one expires or report a failure. All PUBLISH requests and initial SUBSCRIBE request will generate a failure.

When the failing node resumes to normal operation all traffic will be processed again and no requests should fail. The time it takes until all presence states are “correct” again will be minimal since no sessions were failed-over.

If the monitoring feature is implemented in a way that detects the node as “down” in this case, then some users will be migrated to another node and when this node comes back they will be migrated back again. This will generate some increased load for a duration of time. If the overload policy was activated because of a too high traffic load this migration is bad, since is will most likely happen again and since the other servers will most likely also be close to overload. This could lead to a chain reaction resulting in the whole cluster going down and a complete loss of service.

16.3.3.9.2 One Presence Server Overloaded Multiple Times for Five Seconds

This use case describes a Presence server that is going in and out from overload with short time periods such as 5 seconds. This is common if the system is under dimensioned and can barely cope with the traffic load, but it could also be caused by some other disturbance only on that particular node. The User Dispatcher will behave exactly as in "One Presence Server Overloaded for 60 Seconds" and the result will be the same except that the number of failed sessions and failed-over sessions will be smaller due to the shorter failure period.

16.3.3.9.3 Overload Policy Triggered by an OWLCS Software Failure

A failure in the OWLCS software or an application deployed on top of it causes all threads to be locked (deadlock). This will eventually lead to that the in queue is filled up and the overload policy is activated even though the system is not actually overloaded. This is a permanent error that can only be solved by restarting the server.

Depending on if and how the monitor function is implemented the number of affected users can be minimized. However this cannot be distinguished from a “real” overload situation in which case a fail-over may not be the best thing to do.

16.3.3.9.4 A Presence Server Hardware Failure

The cluster consists of four Presence servers, each node consisting of one OWLCS instance with a User Dispatcher and a Presence application deployed. 100,000 users are distributed over the four servers evenly (25,000 on each node). One of the presence servers crashes due to a hardware failure. A manual operation is required to replace broken server with a new one and only after two hours is the server up and running again. Depending on the type of the failure the response code sent back on transactions proxied to the failed node will be 408 or 503.

In this case all sessions on this node will fail since the failure duration is (most likely) more than the expiration time for the subscriptions. If a monitor server is implemented with fail-over then the failure time will be minimized to the detection time (seconds). The users will be migrated by the migration feature, which will create an increased load for a duration of time.

Because the User Dispatcher was also running on the failed node, all the persisted data for the user dispatcher will be lost when replacing the server with a new machine.

16.3.3.9.5 Expanding the Cluster with One Presence Node

The cluster consists of 3 Presence servers, each node consisting of one OWLCS instance with a User Dispatcher and a Presence application deployed. 100,000 users are distributed over the four servers evenly (33,000 on each node). A new node is installed and added to the cluster. The following sequence of operations are performed to add the new node:

The User Dispatcher and the Presence application on the new node are configured with the same settings as the rest of the cluster. This includes synchronizing the distribution state to the new User Dispatcher in case of a pool implementation with persistence.
The addServer JMX operation is invoked with the new node on the cluster User Dispatcher MBean. This will invoke the addServer operation on all User Dispatcher nodes (including the new node).
The Load Balancer is reconfigured with the new node so that initial requests are sent to the new User Dispatcher node.
Depending on the migration approach an additional JMX operation may be invoked on the Presence application (using the cluster MBean server).

The result of this is that the new distribution of users is 25,000 on each node after 8,000 users have been migrated. Depending on the migration method this will generate an increased load of traffic on the system over a period of time.

16.3.3.9.6 Removing a Node from the Cluster

The Load Balance is reconfigured to not include the node to be removed.
The removeNode JMX operation is invoked to remove the node from all the User Dispatcher's in the cluster. The cluster MBean is used to delegate the operation.
Depending on the migration approach an additional JMX operation may be invoked on the node to be removed.
When all users have been migrated from the node to be removed (the duration of this depends on the migration method) the node is finally stopped and removed from the cluster.

The result of this is that the new distribution of users is 33,000 on each node after 8,000 have been migrated.

16.3.3.9.7 OPMN Restart After a Presence Server Crash

Consider a four-node cluster with a User Dispatcher and a Presence application deployed on each node. The Presence server JVM on one of the nodes crashes and OPMN restarts the process. The restart takes one minute.

16.3.3.9.8 503 Responses from an Application

Due to a software bug or misbehavior in the application, 503 responses are sent for all incoming traffic. The SIP server itself is not under a significant load and the Overload Policy has not been activated. This may or may not be a permanent error condition.