19 Converged Application Server Monitoring and Overload Protection

This chapter describes Oracle Communications Converged Application Server monitoring as well as overload protection and how it is configured.

About Monitoring and Overload Protection

Converged Application Server provides two interrelated systems that you can use together to ensure your environments remain within functional boundaries:

  • SIP Server and Application Monitoring Console

  • SIP Overload Protection

The first system, SIP Server and Application Monitoring Console, provides you with a window into the performance of your SIP servers and deployed SIP applications. Using the console, you can review the real time performance of your servers and applications, and spot possible bottlenecks and impending failure conditions.

The second system, SIP Overload Protection, enables you to act upon the data you see in the SIP Server and Application Monitoring Console. Using the SIP Overload Protection interface, you can set flexible traps and thresholds, and statistical algorithms to gracefully handle many types of performance issues before they endanger the health of your environment.

SIP Server and Application Monitoring

Converged Application Server provides a console interface for monitoring your Session Initiation Protocol (SIP) servers and SIP applications.

To access the monitoring interface, do the following:

  1. Use your browser to access the URL http://address:port/console where address is the Administration Server's listen address and port is the listen port.

    Note:

    The default administration console port for Converged Application Server is 7001.
  2. Select the SipServer node in the left pane, and select the Monitoring tab in the right pane.

  3. In the Monitoring tab, you can select the following subtabs:

    • General: Provides general monitoring data on configured SIP servers.

    • SIP Performance: Provides per server performance information.

    • SIP Applications: Provides performance information on deployed SIP applications.

    • Call State Storage: Provides state and statistics information for SIP call state.

The following sections provide details on each monitoring subtab.

General

The General subtab of the Monitoring tab provides a variety of general runtime information on messages and sessions for each configured SIP server. Active SIP and Application sessions are also totaled at the bottom of the pane.

Table 19-1 describes the monitored data.

Table 19-1 General Monitoring Data

Datum Description

Name

The name of the SIP server instance.

Start Time

The time at which the SIP server instance was started.

Application Session Count

The number of active SIP application sessions.

SIP Session Count

The number of active SIP sessions.

Destroyed Application Session Count

The number of destroyed application sessions.

Destroyed SIP Session Count

The number of destroyed SIP sessions.

Messages Received

The number of SIP messages received.

Messages Rejected

The number of rejected SIP messages.

Messages Processed

The total number of SIP messages processed.

Cluster Id

The Converged Application Server cluster ID.


The final row of the table provides domain wide totals for all of the data in Table 19-1.

SIP Performance

The SIP Performance subtab of the Monitoring tab provides runtime performance statistics over a period of time for each configured SIP server. The period (default 60 seconds) and sample frequency (default 10 seconds) are noted at the bottom of the pane.

Table 19-2 describes the monitored data.

Table 19-2 SIP Performance Monitoring Data

Datum Description

Name

The name of the SIP server instance.

SIP Throughput

The SIP message throughput.

Succeeded SIP Trans

The number successful SIP transactions.

Failed SIP Trans

The number of failed SIP transactions.


SIP Applications

The SIP Applications subtab of the Monitoring tab provides runtime session information for SIP applications deployed on each configured SIP server.

Table 19-3 describes the monitored data.

Table 19-3 SIP Applications Data

Datum Description

Engine

The Converged Application Server engine on which the SIP application is deployed.

Name

The name of the SIP application.

SIP Session Count

The number of active SIP sessions.

Application Session Count

The number of active application sessions.

Destroyed SIP Session Count

The number of destroyed SIP sessions.

Destroyed Application Session Count

The number of destroyed application sessions.


Call State Storage

The Call State Storage subtab of the Monitoring tab provides monitoring data in four additional subtabs:

  • Call State Service

  • Call State Cache

  • Call State Metadata Cache

  • Call State Index Cache

The data monitored in each subtab is covered in the following sections.

Call State Service

The Call State Service subtab of the Call State Storage subtab describes state and statistics about the call state Coherence cache service for the entire Converged Application Server domain.

For more details on Coherence statistics and monitoring, see "Introduction to Coherence Management" in Coherence Management Guide.

Table 19-4 describes the monitored data.

Table 19-4 Call State Service Monitoring Data

Datum Description

Server

This is a static label, Total/Average (domainwide).

Local Messages

The umber of messages pending processing.

Received Messages

The total number of messages received by the host since the statistics were last reset.

Sent Messages

The total number of messages sent by the host since the statistics were last reset.

Owned Backup Partitions

The number of partitions that this domain backs up (responsible for the backup storage).

Owned Primary Partitions

The number of partitions that this domain owns (responsible for the primary storage).

Endangered Partitions

The number of partitions that are not currently backed up.

Unbalanced Partitions

The number of primary and backup partitions which remain to be transferred until the partition distribution across the storage enabled service members is fully balanced.

Vulnerable Partitions

The number of partitions that are backed up on the same computer where the primary partition owner resides.

Average Request Duration

The average duration (in milliseconds) of an individual synchronous request issued by the service since the last time the statistics were reset.

Max Request Duration

The maximum duration (in milliseconds) of a synchronous request issued by the service since the last time the statistics were reset.

Pending Request Duration

The duration (in milliseconds) of the oldest pending synchronous request issued by the service.

Average Task Duration

The average duration (in milliseconds) of an individual task execution.

Task Backlog

The size of the backlog queue that holds tasks scheduled to be executed by a service thread.

Max Task Backlog

The maximum size of the backlog queue since the last time the statistics were reset.

Idle Thread Count

The number of currently idle threads in the service thread pool.


Call State Cache

The Call State Cache subtab of the Call State Storage subtab describes state and statistics about the call state Coherence cache for the entire Converged Application Server domain.

For more details on Coherence statistics and monitoring, see "Introduction to Coherence Management" in Coherence Management Guide.

Table 19-5 describes the monitored data.

Table 19-5 Call State Cache Monitoring Data

Datum Description

Server

This is a static label, Total/Average (domainwide).

Entry Count

The number of entries in the Coherence call state cache.

Data Size

The total data size of the Coherence call state cache.


Call State Metadata Cache

The Call State Metadata Cache subtab of the Call State Storage subtab describes state and statistics about the call state metadata Coherence cache for the entire Converged Application Server domain.

For more details on Coherence statistics and monitoring, see "Introduction to Coherence Management" in Coherence Management Guide.

Table 19-6 describes the monitored data.

Table 19-6 Call State Metadata Cache Monitoring Data

Datum Description

Server

This is a static label, Total/Average (domainwide).

Entry Count

The number of entries in the Coherence call state metadata cache.

Data Size

The total data size of the Coherence call state metadata cache.


Call State Index Cache

The Call State Index Cache subtab of the Call State Storage subtab describes state and statistics about the call state index Coherence cache for the entire Converged Application Server domain.

For more details on Coherence statistics and monitoring, see "Introduction to Coherence Management" in Coherence Management Guide.

Table 19-7 describes the monitored data.

Table 19-7 Call State Index Cache Monitoring Data

Datum Description

Server

This is a static label, Total/Average (domainwide).

Entry Count

The number of entries in the Coherence call state index cache.

Data Size

The total data size of the Coherence call state index cache.


Other Ways to Monitor Converged Application Server

In addition to using the monitoring functionality in the WebLogic console, you can also monitor Converged Application Server using the WebLogic Scripting Tool (WLST), Java Management Extensions (JMX) as well as the WebLogic Diagnostic Framework (WLDF). The next sections provide additional details.

Monitoring Applications with the WebLogic Scripting Tool

The WebLogic Scripting Tool (WLST) is a command-line scripting environment that you can use to create, manage, and monitor WebLogic domains. It is based on the Java scripting interpreter, Jython. In addition to supporting standard Jython features such as local variables, conditional variables, and flow control statements, WLST provides a set of scripting functions (commands) that are specific to WebLogic Server.

You can use WLST to retrieve information that WebLogic Server instances produce to describe their run-time state. For more information, see "Getting Runtime Information" in Understanding the WebLogic Scripting Tool.

Developing Custom Management Utilities with JMX

To integrate third-party management systems with the WebLogic Server management system, WebLogic Server provides standards-based interfaces that are fully compliant with the Java Management Extensions (JMX) specification. You can use these interfaces to monitor WebLogic Server MBeans, to change the configuration of a WebLogic Server domain, and to monitor the distribution (activation) of those changes to all server instances in the domain.

To get started creating custom JMX management utilities, see "Introduction and Roadmap" in Developing Custom Management Utilities Using JMX for Oracle WebLogic Server.

WebLogic Server Diagnostic Framework

The WebLogic Diagnostic Framework (WLDF) consists of a number of components that work together to collect, archive, and access diagnostic information about a WebLogic Server instance and its applications. Converged Application Server version integrates with several components of the WLDF in order to monitor and diagnose the operation of engines, as well as deployed SIP Servlets. For details, see Chapter 20, "Using the WebLogic Server Diagnostic Framework (WLDF)".

About Converged Application Server Overload Protection

Converged Application Server implements an overload framework which supports plug-in statistics collectors, plug-in event handlers, as well as multiple threshold settings and statistics collection algorithms.

About the Overload Protection Framework

Converged Application Server overload protection statistics collectors and event handlers are installed as Statistics Provider Interface (SPI) plug-ins. Only a single instance of each statistics collector and event handler can be instantiated as utility functions in the SPI.

Multiple thresholds can be configured for each statistics collector, and, when activated upon an incoming SIP session, samples are collected at a user-configurable interval, and statistics results are calculated according to a user-configurable algorithm. The results of the statistics calculations are then used to execute particular actions depending upon the comparison of those results with a user-configurable threshold value.

Configuring Overload Protection

This section describes using the WebLogic Administration console to configure event handlers and statistics collectors.

Execute the following steps in order, since the later configurations have dependencies upon the earlier steps.

Using the WebLogic administration console, you:

  1. Configure a new event handler. See "About Event Handlers".

  2. Configure actions for the event handler. See "About Actions".

  3. Configure a statistics collector. See "About Statistics Collectors".

  4. Configure a threshold, which includes a threshold statistics value, as well as sampling intervals, number of samples to collect at each interval (or real-time sampling), an algorithm to calculate the collected samples, as well as actions for upward and downward breaches of the threshold. See "About Thresholds".

About Event Handlers

A Converged Application Server overload protection event handler plugs in to the SPI, and is discovered when the overload protection framework is initialized. When a particular event handler is discovered, only one instance is created and managed by the framework. Each event handler must implement one or more actions. When a threshold-breaching event occurs, the framework executes the actions defined for the event handler.

Each event handler can accept an optional event-handler scoped set of user configurable key/value pairs, which are passed to the event handler's activate() method as parameters.

Configuring an Event Handler

To configure an overload protection event handler:

  1. Open the Administration Console for your domain.

  2. If your domain is running in Production mode, click Lock & Edit.

  3. Click the SipServer link in the Domain Structure pane.

    The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.

  4. Click the Overload Protections subtab and then click the Event Handlers subtab.

  5. In the Event Handlers table, click New.

    Enter the following information:

    • Event Handler Name: Required. Enter a name for the event handler, for example:

      com.oracle.sendSnmpTrap
      

      Table 19-8 lists the event handlers provided with Converged Application Server.

      Table 19-8 Default Event Handlers

      Event Handler Description

      com.oracle.trafficControl

      Used for a new call setup on a SIP container and either reject or accept call traffic.

      com.oracle.sendSnmpTrap

      Used to send SNMP traps.


    • Attributes: Optional. Specify key/value attribute pairs separated by semicolons, for example:

      attribute1=21;attribute2=64
      

      Attributes are passed to the event handler as parameters when the event is triggered.

      Note:

      The com.oracle.sendSnmpTrap event handler supports a snmp-trap-message attribute. Its default value is overloadControlActivated. No attributes are supported for the com.oracle.trafficControl event.
  6. Click Save to save your configuration changes.

  7. If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.

About Actions

Once you have defined an event handler, you must define one or more actions for the event handler to take when a threshold breaching event occurs. As with event handlers, actions are also plugged into the overload protection framework using the SPI, and are discovered when the framework is initialized, and, when discovered, only one instance is created and managed by the framework.

Each action can accept an optional action-scoped set of user configurable key/value pairs, which are passed to the actions activate() method as parameters.

Supported out of the box action types are listed in Table 19-9.

Configuring an Action

To configure an overload protection action:

  1. Open the Administration Console for your domain.

  2. If your domain is running in Production mode, click Lock & Edit.

  3. Click the SipServer link in the Domain Structure pane.

    The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.

  4. Click the Overload Protections subtab and then click the Actions subtab.

  5. In the Actions table, click New.

    Enter the following information:

    • Action Name: Required. Enter a name for the action, for example:

      TrafficReject
      
    • Event Handler: Required. Choose the name of an event handler you have created from the drop down list. For information on configuring an event handler, see "About Event Handlers".

    • Action Type: Required. Enter an Action Type supported by the Event Handler, for example:

      reject-traffic
      

      Table 19-9 lists the Action Types supplied with Converged Application Server.

      Table 19-9 Default Action Types

      Action Type Description

      accept-traffic

      Used by the event handler, com.oracle.trafficControl. After an overload condition has cleared, accepts SIP session traffic.

      reject-traffic

      Used by the event handler, com.oracle.trafficControl. When an overload condition occurs, rejects SIP session traffic. SIP session traffic will continue to be rejected until an accept-traffic action is triggered.

      default

      Used by the event handler, com.oracle.sendSnmpTrap.


    • Attributes: Optional. Specify key/value attribute pairs separated by semicolons, for example:

      attribute1=21;attribute2=64
      

      Attributes are passed when the action is triggered.

      Note:

      Support for attributes is dependent upon the implementation of the particular action. None of the default Action Types support any attributes.
  6. Click Save to save your configuration changes.

  7. If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.

About Statistics Collectors

Statistics collectors are also plugged into the overload protection framework using the SPI, and are discovered when the framework is initialized. When a particular statistics collector framework is discovered, only one instance is created and managed by the framework.

Each statistics collector consists of a name, a type and optional attributes. The collector name is referred to when defining a threshold as described in "Configuring a Threshold". The overload protection framework retrieves statistics samples using the statistics collector's getStats() method to which the optional attributes are passed as parameters.

Supported out of the box statistics collectors are described in Table 19-10.

Configuring a Statistics Collector

To configure an overload protection statistics collector:

  1. Open the Administration Console for your domain.

  2. If your domain is running in Production mode, click Lock & Edit.

  3. Click the SipServer link in the Domain Structure pane.

    The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.

  4. Click the Overload Protections subtab and then click the Statistics Collector subtab.

  5. In the Statistics Collector table, click New.

    Enter the following information:

    • Statistics Collector Name: Required. Enter a name for the action, for example:

      MBeanStatsCollector
      
    • Statistics Collector Type: Required. Enter an Action Type supported by the Event Handler, for example:

      mbean-stats
      

      Table 19-10 lists the Statistics Collector Types supplied with Converged Application Server.

      Table 19-10 Default Statistics Collector Types

      Statistics Collector Type Description

      queue-length

      Uses the sum of the length of the transport and timer work manager queue lengths.

      mbean-stats

      Uses an MBean counter as a statistics example.

      memory-usage

      Returns the call state memory usage from Coherence.

      active-diameter-session

      Returns the number of active Diameter sessions.


    • Attributes: Optional except for the mbean-stats collector type. Specify key/value attribute pairs separated by semicolons, for example:

      attribute1=21;attribute2=64
      

      Attributes are passed when the action is triggered.

      Note:

      The mbean-stats collector lets you use an MBean counter for statistics samples. When configuring the collector, the attributes object-name and attribute-name must be set so that the collector can find the attribute value of the particular MBean.

      For the object-name attribute, a variable ${server_name} can be used that will be replaced with name of managed server on which the statistics collector is running.

      The following example shows a configuration retrieving the ServerAppSessionCount from the SipServerRuntime MBean on the current server.

      object-name="com.bea:ServerRuntime=${server_name},Name=${server_name},Type=SipServerRuntime;";attribute-name=ServerAppSessionCount
      

      For a complete list of Converged Application Server MBeans, see the Oracle Communications Converged Application Server Java API Reference.

  6. Click Save to save your configuration changes.

  7. If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.

About Thresholds

An overload protection threshold consists of a threshold value, a collector, sampling settings, and two lists of overload protection actions defined for an event handler.

Thresholds work in two modes: a sampling mode with a configurable interval and number of samples, and a real-time mode. For both modes, statistics samples are collected and calculated according to an selectable algorithm and compared to the threshold value. Each threshold has two events, UP_EVENT and DOWN_EVENT. When the threshold is breached upwards, the UP_EVENT event is triggered and when it is breached downwards, the DOWN_EVENT event is triggered.

For each event, you can configure a list of event handler actions. When an event is triggered, the overload protection framework will execute each action associated with the threshold event.

Configuring a Threshold

To configure an overload protection Threshold:

  1. Open the Administration Console for your domain.

  2. If your domain is running in Production mode, click Lock & Edit.

  3. Click the SipServer link in the Domain Structure pane.

    The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.

  4. Click the Overload Protections subtab and then click the Thresholds subtab.

  5. In the Thresholds table, click New.

    Enter the following information:

    • Threshold Name: Required. Enter a name for the action, for example:

      queueLengthThreshold
      
    • Threshold Value: Required. Enter the level of the threshold. This is the value that the threshold must exceed to trigger an event, for example:

      10.0
      

      Note:

      The Threshold Value cannot be greater than 100.
    • Sampling Mode: Required. Choose either realtime or sampling from the drop down list. In realtime mode, statistics are compared against the Threshold Value when every initial SIP message is received. No calculations are supported.

    • Sampling Interval. Required when sampling mode is selected. Enter the interval at which samples should be taken in milliseconds, for example:

      1000
      
    • Sampling Number. Required when sampling mode is selected. Enter the number of samples to be taken at each Sampling Interval, for example:

      5
      
    • Algorithm Name: Required. Choose an appropriate algorithm to calculate samples. Table 19-11 lists the available algorithms.

      Table 19-11 Algorithm Types

      Algorithm Name Description

      PERCNTILE

      Calculates the Pth percentile value of the samples. When PERCNTILE is selected, an Algorithm Parameter value must be provided.

      AVERAGE

      Calculates the average of the samples (sum of samples divided by number of samples).

      VALUE

      The straight value of the last sample.

      RATE

      The sample rate calculated as (last sample - first sample)/(sampling interval).


    • Algorithm Parameter: Required when the PERCNTILE algorithm is selected. Enter a percentile value that the threshold must match, for example:

      65
      
    • Enable: Optional. Check Enable to enable the Threshold.

  6. Click Next.

  7. Choose the Actions to be executed when a threshold is breached upwards (if any) by moving an Action from the Available list to the Chosen list.

  8. Click Next.

  9. Choose the Actions to be executed when a threshold is breached downwards (if any) by moving an Action from the Available list to the Chosen list.

  10. Click Finish to save your configuration changes.

  11. If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.

Example: Configuring Overload Protection Based upon Session Rate

In the following example you create an overload protection scheme based upon the session rate. You begin by creating an event handler of the type com.oracle.trafficControl to react to traffic control events. Next, you create two actions that the event handler will initiate, one to reject SIP session traffic and another to accept SIP session traffic. You then create a statistics collector that reads counter information from the SipServerRuntime MBean, and you finally create a threhold that takes 5 samples every 1000 milliseconds and reacts on an upwards/downwards breach of a particular threshold value you set.

Once configured, when your threshold value is breached upwards, SIP traffic will be rejected until the threshold value is again breached downwards.

To configure a session rate overload protection scheme:

  1. Open the Administration Console for your domain.

  2. If your domain is running in Production mode, click Lock & Edit.

  3. Click the SipServer link in the Domain Structure pane.

    The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.

  4. Click the Overload Protections subtab and then click the Event Handlers subtab.

  5. In the Event Handlers table, click New, and enter com.oracle.trafficControl for the Event Handler Name.

  6. Click Save to save your configuration changes.

  7. Click the SipServer link in the Domain Structure pane.

  8. Click the Overload Protections subtab and then click the Actions subtab.

  9. In the Actions table, click New and enter the following information:

    • Action Name: Enter TrafficReject.

    • Event Handler: Select com.oracle.trafficControl from the drop down list.

    • Action Type: Enter reject-traffic.

  10. Click Save to save your configuration changes.

  11. In the Actions table, click New and enter the following information:

    • Action Name: Enter TrafficAccept.

    • Event Handler: Select com.oracle.trafficControl from the drop down list.

    • Action Type: Enter accept-traffic.

  12. Click the SipServer link in the Domain Structure pane.

  13. Click the Overload Protections subtab and then click the Statistics Collector subtab.

  14. In the Statistics Collectors table, click New and enter the following information:

    • Statistics Collector Name: Enter com.oracle.mbeanStatsCollector.

    • Statistics Collector Type: Enter mbean-stats.

    • Attributes: Enter:

      object-name="com.bea:ServerRuntime=${server_name},Name=${server_name},Type=SipServerRuntime;";attribute-name=ServerAppSessionCount
      
  15. Click Save to save your configuration changes.

  16. Click the SipServer link in the Domain Structure pane.

  17. Click the Overload Protections subtab and then click the Thresholds subtab.

  18. In the Thresholds table, click New and enter the following information:

    • Threshold Name: Enter SessionRate.

    • Threshold Value: Enter the threshold value you wish to use for the maximum number of sessions.

      Note:

      The Threshold Value cannot be greater than 100.
    • Sampling Mode: Select sampling from the drop down list.

    • Sampling Interval: Enter 1000 to take a sample every 1000 milliseconds.

    • Sampling Number: Enter 5 to take 5 samples at each sampling interval.

    • Algorithm Name: Select RATE from the drop down list.

    • Statistics Collector: Select com.oracle.mbeanStatsCollector from the drop down list.

    • Check Enable.

  19. Click Next.

  20. For Up Actions, move TrafficReject from the Available list to the Chosen list.

  21. Click Next.

  22. For Down Actions move TrafficAccept from the Available list to the Chosen list.

  23. Click Finish.

  24. If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.