8 Monitoring and Troubleshooting

The following sections describe how to configure use the Oracle WebLogic Communication Services "echo server" process to improve SIP data tier failover performance when a server becomes physically disconnected from the network:

8.1 Avoiding and Recovering from Server Failures

A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, or unexpected application behavior may each contribute to the failure of a server instance.

Oracle WebLogic Communication Services uses a highly clustered architecture as the basis for minimizing the impact of failure events. However, even in a clustered environment it is important to prepare for a sound recovery process in the event that an individual server or server machine fails.

The following sections summarize Oracle WebLogic Communication Services failure prevention and recovery features, and describe the configuration artifacts that are required in order to restore different portions of a Oracle WebLogic Communication Services domain

8.1.1 Failure Prevention and Automatic Recovery Features

Oracle WebLogic Communication Services, and the underlying WebLogic Server platform, provide many features that protect against server failures. In a production system, all available features should be used in order to ensure uninterrupted service.

8.1.1.1 Overload Protection

Oracle WebLogic Communication Services detects increases in system load that could affect the performance and stability of deployed SIP Servlets, and automatically throttles message processing at predefined load thresholds.

Using overload protection helps you avoid failures that could result from unanticipated levels of application traffic or resource utilization.

Oracle WebLogic Communication Services attempts to avoid failure when certain conditions occur:

  • The rate at which SIP sessions are created reaches a configured value, or

  • The size of the SIP timer and SIP request-processing execute queues reaches a configured length.

The underlying WebLogic Server platform also detects increases in system load that can affect deployed application performance and stability. WebLogic Server allows administrators to configure failure prevention actions that occur automatically at predefined load thresholds. Automatic overload protection helps you avoid failures that result from unanticipated levels of application traffic or resource utilization as indicated by:

  • A workload manager's capacity being exceeded

  • The HTTP session count increasing to a predefined threshold value

  • Impending out of memory conditions

8.1.1.2 Redundancy and Failover for Clustered Services

You can increase the reliability and availability of your applications by using multiple engine tier servers in a dedicated cluster, as well as multiple SIP data tier servers (replicas) in a dedicated SIP data tier cluster. Because engine tier clusters maintain no stateful information about SIP dialogs (calls), the failure of an engine tier server does not result in any data loss or dropped calls. Multiple replicas in a SIP data tier partition store redundant copies of call state information, and automatically failover to one another should a replica fail.

8.1.1.3 Automatic Restart for Failed Server Instances

WebLogic Server self-health monitoring features improve the reliability and availability of server instances in a domain. Selected subsystems within each server instance monitor their health status based on criteria specific to the subsystem. (For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics.) If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as "failed" with the host server.

Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.

When used in combination with Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator.

8.1.1.4 Managed Server Independence Mode

Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it contacts its Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally-cached configuration information—this is the configuration that was current at the time of the Managed Server's most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled.

8.1.1.5 Automatic Migration of Failed Managed Servers

When using Linux or UNIX operating systems, you can use WebLogic Server's server migration feature to automatically start a candidate (backup) server if a Network tier server's machine fails or becomes partitioned from the network. The server migration feature uses node manager, in conjunction with the wlsifconfig.sh script, to automatically boot candidate servers using a floating IP address. Candidate servers are booted only if the primary server hosting a Network tier instance becomes unreachable. See "Whole Server Migration" in Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server documentation for more information about using the server migration feature.

8.1.1.6 Geographic Redundancy for Regional Site Failures

In addition to server-level redundancy and failover capabilities, Oracle WebLogic Communication Services enables you to configure peer sites to protect against catastrophic failures, such as power outages, that can affect an entire domain. This enables you to failover from one geographical site to another, avoiding complete service outages.

8.1.2 Directory and File Backups for Failure Recovery

Recovery from the failure of a server instance requires access to the domain's configuration data. By default, the Administration Server stores a domain's primary configuration data in a file called domain_name/config/config.xml, where domain_name is the root directory of the domain. The primary configuration file may reference additional configuration files for specific WebLogic Server services, such as JDBC and JMS, and for Oracle WebLogic Communication Services services, such as SIP container properties and SIP data tier configuration. The configuration for specific services are stored in additional XML files in subdirectories of the domain_name/config directory, such as domain_name/config/jms, domain_name/config/jdbc, and domain_name/config/custom for Oracle WebLogic Communication Services configuration files.

The Administration Server can automatically archive multiple versions of the domain configuration (the entire domain-name/config directory). The configuration archives can be used for system restoration in cases where accidental configuration changes need to be reversed. For example, if an administrator accidentally removes a configured resource, the prior configuration can be restored by using the last automated backup.

The Administration Server stores only a finite number of automated backups locally in domain_name/config. For this reason, automated domain backups are limited in their ability to guard against data corruption, such as a failed hard disk. Automated backups also do not preserve certain configuration data that are required for full domain restoration, such as LDAP repository data and server start-up scripts. Oracle recommends that you also maintain multiple backup copies of the configuration and security offline, in a source control system.

This section describes file backups that Oracle WebLogic Communication Services performs automatically, as well as manual backup procedures that an administrator should perform periodically.

8.1.2.1 Enabling Automatic Configuration Backups

Follow these steps to enable automatic domain configuration backups on the Administration Server for your domain:

  1. Access the Administration Console for your domain.

  2. In the left pane of the Administration Console, select the name of the domain.

  3. In the right pane, click the Configuration > General tab.

  4. Select Advanced to display advanced options.

  5. Select Configuration Archive Enabled.

  6. In the Archive Configuration Count box, enter the maximum number of configuration file revisions to save.

  7. Click Save.

When you enable configuration archiving, the Administration Server automatically creates a configuration JAR file archive. The JAR file contains a complete copy of the previous configuration (the complete contents of the domain-name\config directory). JAR file archive files are stored in the domain-name\configArchive directory. The files use the naming convention config-number.jar, where number is the sequential number of the archive.

When you save a change to a domain's configuration, the Administration Server saves the previous configuration in domain-name\configArchive\config.xml#n. Each time the Administration Server saves a file in the configArchive directory, it increments the value of the #n suffix, up to a configurable number of copies—5 by default. Thereafter, each time you change the domain configuration:

  • The archived files are rotated so that the newest file has a suffix with the highest number,

  • The previous archived files are renamed with a lower number, and

  • The oldest file is deleted.

Keep in mind that configuration archives are stored locally within the domain directory, and they may be overwritten according to the maximum number of revisions you selected. For these reasons, you must also create your own off-line archives of the domain configuration, as described in Section 8.1.2.2, "Storing the Domain Configuration Offline".

8.1.2.2 Storing the Domain Configuration Offline

Although automatic backups protect against accidental configuration changes, they do not protect against data loss caused by a failure of the hard disk that stores the domain configuration, or accidental deletion of the domain directory. To protect against these failures, you must also store a complete copy of the domain configuration offline, preferably in a source control system.

Oracle recommends storing a copy of the domain configuration at regular intervals. For example, back up a new revision of the configuration when:

  • you first deploy the production system

  • you add or remove deployed applications

  • the configuration is tuned for performance

  • any other permanent change is made.

The domain configuration backup should contain the complete contents of the domain_name/config directory. For example:

cd ~/user_projects/domains/mydomain
tar cvf domain-backup-06-17-2007.jar config

Store the new archive in a source control system, preserving earlier versions should you need to restore the domain configuration to an earlier point in time.

8.1.2.3 Backing Up Server Start Scripts

In a Oracle WebLogic Communication Services deployment, the start scripts used to boot engine and SIP data tier servers are generally customized to include domain-specific configuration information such as:

  • JVM Garbage Collection parameters required to achieve throughput targets for SIP message processing (see Section 8.8, "Tuning JVM Garbage Collection for Production Deployments"). Different parameters (and therefore, different start scripts) are generally used to boot engine and SIP data tier servers.

  • Configuration parameters and startup information for the Oracle WebLogic Communication Services heartbeat mechanism. If you use the heartbeat mechanism, engine tier server start scripts should include startup options to enable and configure the heartbeat mechanism. SIP data tier server start scripts should include startup options to enable heartbeats and start the WlssEchoServer process.

Backup each distinct start script used to boot engine tier, SIP data tier, or diameter relay servers in your domain.

8.1.2.4 Backing Up Logging Servlet Applications

If you use Oracle WebLogic Communication Services logging Servlets (see Section 8.7, "Logging SIP Requests and Responses") to perform regular logging or auditing of SIP messages, backup the complete application source files so that you can easily redeploy the applications should the staging server fail or the original deployment directory becomes corrupted.

8.1.2.5 Backing Up Security Data

The WebLogic Security service stores its configuration data config.xml file, and also in an LDAP repository and other files.

8.1.2.5.1 Backing Up SerializedSystemIni.dat and Security Certificates

All servers create a file named SerializedSystemIni.dat and place it in the server's root directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.

If you configured a server to use SSL, also back up the security certificates and keys. The location of these files is user-configurable.

8.1.2.5.2 Backing Up the WebLogic LDAP Repository

The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with Oracle WebLogic Communication Services store their data in an LDAP server. Each Oracle WebLogic Communication Services contains an embedded LDAP server. The Administration Server contains the master LDAP server, which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:

domain_name\adminServer\ldap

where domain_name is the domain's root directory and adminServer is the directory in which the Administration Server stores runtime and security data.

Each Oracle WebLogic Communication Services has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.

The ldap/ldapfiles subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under the ldap directory contain LDAP server message logs and data about replicated LDAP servers.

Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldap directory tree, the backups in the ldapfiles subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available.

Once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP file below the ldap\backup directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.

8.1.2.6 Backing Up Additional Operating System Configuration Files

Certain files maintained at the operating system level are also critical in helping you recover from system failures. Consider backing up the following information as necessary for your system:

  • Load Balancer configuration scripts. For example, any automated scripts used to configure load balancer pools and virtual IP addresses for the engine tier cluster, as well as NAT configuration settings.

  • NTP client configuration scripts used to synchronize the system clocks of engine and SIP data tier servers.

  • Host configuration files for each Oracle WebLogic Communication Services machine (host names, virtual and real IP addresses for multi-homed machines, IP routing table information).

8.1.3 Restarting a Failed Administration Server

When you restart a failed Administration Server, no special steps are required. Start the Administration Server as you normally would.

If the Administration Server shuts down while Managed Servers continue to run, you do not need to restart the Managed Servers that are already running in order to recover management of the domain. The procedure for recovering management of an active domain depends upon whether you can restart the Administration Server on the same machine it was running on when the domain was started.

8.1.3.1 Restarting an Administration Server on the Same Machine

If you restart the WebLogic Administration Server while Managed Servers continue to run, by default the Administration Server can discover the presence of the running Managed Servers.

Note:

Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers.

The root directory for the domain contains a file, running-managed-servers.xml, which contains a list of the Managed Servers in the domain and describes whether they are running or not. When the Administration Server restarts, it checks this file to determine which Managed Servers were under its control before it stopped running.

When a Managed Server is gracefully or forcefully shut down, its status in running-managed-servers.xml is updated to "not-running". When an Administration Server restarts, it does not try to discover Managed Servers with the "not-running" status. A Managed Server that stops running because of a system crash, or that was stopped by killing the JVM or the command prompt (shell) in which it was running, will still have the status "running' in running-managed-servers.xml. The Administration Server will attempt to discover them, and will throw an exception when it determines that the Managed Server is no longer running.

Restarting the Administration Server does not cause Managed Servers to update the configuration of static attributes. Static attributes are those that a server refers to only during its startup process. Servers instances must be restarted to take account of changes to static configuration attributes. Discovery of the Managed Servers only enables the Administration Server to monitor the Managed Servers or make runtime changes in attributes that can be configured while a server is running (dynamic attributes).

8.1.3.2 Restarting an Administration Server on Another Machine

If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:

  1. Install the Oracle WebLogic Communication Services software on the new administration machine (if this has not already been done).

  2. Make your application files available to the new Administration Server by copying them from backups or by using a shared disk. Your application files should be available in the same relative location on the new file system as on the file system of the original Administration Server.

  3. Make your configuration and security data available to the new administration machine by copying them from backups or by using a shared disk. For more information, refer to Section 8.1.2.2, "Storing the Domain Configuration Offline" and Section 8.1.2.5, "Backing Up Security Data".

  4. Restart the Administration Server on the new machine.

    Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers.

When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.

8.1.4 Restarting Failed Managed Servers

If the machine on which the failed Managed Server runs can contact the Administration Server for the domain, simply restart the Managed Server manually or automatically using Node Manager. Note that you must configure Node Manager and the Managed Server to support automated restarts.

If the Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading locally-cached configuration data. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode.

To start up a Managed Server in MSI mode:

  1. Ensure that the following files are available in the Managed Server's root directory:

    • msi-config.xml

    • SerializedSystemIni.dat

    • boot.properties

    If these files are not in the Managed Server's root directory:

    1. Copy the config.xml and SerializedSystemIni.dat file from the Administration Server's root directory (or from a backup) to the Managed Server's root directory.

    2. Rename the configuration file to msi-config.xml. When you start the server, it will use the copied configuration files.

      Note:

      Alternatively, use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files.
  2. Start the Managed Server at the command line or using a script.

    The Managed Server will run in MSI mode until it is contacted by its Administration Server. For information about restarting the Administration Server in this scenario, see Section 8.1.3, "Restarting a Failed Administration Server".

8.2 Overview of Failover Detection

In a production system, engine tier servers continually access SIP data tier replicas in order to retrieve and write call state data. The Oracle WebLogic Communication Services architecture depends on engine tier nodes to detect when a SIP data tier server has failed or become disconnected. When an engine cannot access or write call state data because a replica is unavailable, the engine connects to another replica in the same partition and reports the offline server. The replica updates the current view of the SIP data tier to account for the offline server, and other engines are then notified of the updated view as they access and retrieve call state data.

By default, an engine tier server uses its RMI connection to the replica to determine if the replica has failed or become disconnected. The algorithms used to determine a failure of an RMI connection are reliable, but ultimately they depend on the TCP protocol's retransmission timers to diagnose a disconnection (for example, if the network cable to the replica is removed). Because the TCP retransmission timer generally lasts a full minute or longer, Oracle WebLogic Communication Services provides an alternate method of detecting failures that can diagnose a disconnected replica in a matter of a few seconds.

8.2.1 WlssEchoServer Failure Detection

WlssEchoServer is a separate process that you can run on the same server hardware as a SIP data tier replica. The purpose of WlssEchoServer is to provide a simple UDP echo service to engine tier nodes to be used for determining when a SIP data tier server goes offline, for example in the event that the network cable is disconnected. The algorithm for detecting failures with WlssEchoServer is as follows:

  1. For all normal traffic, engine tier servers communicate with SIP data tier replicas using TCP. TCP is used as the basic transport between the engine tier and SIP data tier regardless of whether or not WlssEchoServer is used.

  2. Engine tier servers send a periodic heartbeat message to each configured WlssEchoServer over UDP. During normal operation, WlssEchoServer responds to the heartbeats so that the connection between the engine node and replica is verified.

  3. Should there be a complete failure of the SIP data tier stack, or the network cable is disconnected, the heartbeat messages are not returned to the engine node. In this case, the engine node can mark the replica as being offline without having to wait for the normal TCP connection timeout.

  4. After identifying the offline server, the engine node reports the failure to an available SIP data tier replica, and the SIP data tier view is updated as described in the previous section.

Also, should a SIP data tier server notice that its local WlssEchoServer process has died, it automatically shuts down. This behavior ensures even quicker failover because avoids the time it takes engine nodes to notice and report the failure as described in Section 8.2, "Overview of Failover Detection".

You can configure the heartbeat mechanism on engine tier servers to increase the performance of failover detection as necessary. You can also configure the listen port and log file that WlssEchoServer uses on SIP data tier servers.

8.2.2 Forced Shutdown for Failed Replicas

If any engine tier server cannot communicate with a particular replica, the engine access another, available replica in the SIP data tier to report the offline server. The replica updates its view of the affected partition to remove the offline server. The updated view is then distributed to all engine tier servers that later access the partition. Propagating the view in this manner helps to ensure that engine servers do not attempt to access the offline replica.

The replica that updates the view also issues a one-time request to the offline replica to ask it to shut down. This is done to try to shut-down running replica servers that cannot be accessed by one or more engine servers due to a network outage. If an active replica can reach the replica marked as "offline," the offline replica shuts down.

8.3 Improving Failover Performance for Physical Network Failures

Note:

Using WlssEchoServer is not required in all Oracle WebLogic Communication Services installations. Enable the echo server only when your system requires detection of a network or replica failure faster than the configured TCP timeout interval.

Observe the following requirements and restrictions when using WlssEchoServer to detect replica failures:

  • If you use the heartbeat mechanism to detect failures, you must ensure that the WlssEchoServer process is always running on each replica server machine. If the WlssEchoServer process fails or is stopped, the replica will be treated as being "offline" even if the server process is unaffected.

  • Note that WlssEchoServer listens on all IP addresses available on the server machine.

  • WlssEchoServer requires a dedicated port number to listen for heartbeat messages.

8.3.1 Starting WlssEchoServer on SIP Data Tier Server Machines

WlssEchoServer is a Java program that you can start directly from a shell or command prompt. The basic syntax for starting WlssEchoServer is:

java -classpath WLSS_HOME/server/lib/wlss/wlssechosvr.jar options com.bea.wcp.util.WlssEchoServer

Where WLSS_HOME is the path to the Oracle WebLogic Communication Services installation and options may include one of the options described in Table 8-1.

Table 8-1 WlssEchoServer Options

Option Description
-Dwlss.ha.echoserver.ipaddress

Specifies the IP address on which the WlssEchoServer instance listens for heartbeat messages. If you do not specify an IP address, the instance listens on any available IP address (0.0.0.0).

-Dwlss.ha.echoserver.port

Specifies the port number used to listen for heartbeat messages. Ensure that the port number you specify is not used by any other process on the server machine. By default WlssEchoServer uses port 6734.

-Dwlss.ha.echoserver.logfile

Specifies the log file location and name. By default, log messages are written to ./echo_servertime.log where time is the time expressed in milliseconds.


Oracle recommends that you include the command to start WlssEchoServer in the same script you use to start each Oracle WebLogic Communication Services SIP data tier instance. If you use the startManagedWebLogic.sh script to start an engine or SIP data tier server instance, add a command to start WlssEchoServer before the final command used to start the server. For example, change the lines:

"$JAVA_HOME/bin/java" ${JAVA_VM} ${MEM_ARGS} ${JAVA_OPTIONS}     \
  -Dweblogic.Name=${SERVER_NAME}                                 \
  -Dweblogic.management.username=${WLS_USER}                     \
  -Dweblogic.management.password=${WLS_PW}                       \
  -Dweblogic.management.server=${ADMIN_URL}                      \
  -Djava.security.policy="${WL_HOME}/server/lib/weblogic.policy" \
   weblogic.Server

to read:

"$JAVA_HOME/bin/java" -classpath WLSS_HOME/server/lib/wlss/wlssechosvr.jar    \
  -Dwlss.ha.echoserver.ipaddress=192.168.1.4                   \
  -Dwlss.ha.echoserver.port=6734 com.bea.wcp.util.WlssEchoServer &
"$JAVA_HOME/bin/java" ${JAVA_VM} ${MEM_ARGS} ${JAVA_OPTIONS}     \
  -Dweblogic.Name=${SERVER_NAME}                                 \
  -Dweblogic.management.username=${WLS_USER}                     \
  -Dweblogic.management.password=${WLS_PW}                       \
  -Dweblogic.management.server=${ADMIN_URL}                      \
  -Djava.security.policy="${WL_HOME}/server/lib/weblogic.policy" \
   weblogic.Server 

8.3.2 Enabling and Configuring the Heartbeat Mechanism on Servers

To enable the WlssEchoServer heartbeat mechanism, you must include the -Dreplica.host.monitor.enabled JVM argument in the command you use to start all engine and SIP data tier servers. Oracle recommends adding this option directly to the script used to start Managed Servers in your system. For example, in the startManagedWebLogic.sh script, change the line:

# JAVA_OPTIONS="-Dweblogic.attribute=value -Djava.attribute=value"

to read:

JAVA_OPTIONS="-Dreplica.host.monitor.enabled=true"

Several additional JVM options configure the functioning of the heartbeat mechanism. Table 8-1 describes the options used to configure failure detection.

Table 8-2 WlssEchoServer Options

Option Description
-Dreplica.host.monitor.enabled

This system property is required on both engine and SIP data tier servers to enable the heartbeat mechanism.

-Dwlss.ha.heartbeat.interval

Specifies the number of milliseconds between heartbeat messages. By default heartbeats are sent every 1,000 milliseconds.

-Dwlss.ha.heartbeat.count

Specifies the number of consecutive, missed heartbeats that are permitted before a replica is determined to be offline. By default, a replica is marked offline if the WlssEchoServer process on the server fails to respond to 3 heartbeat messages.

-Dwlss.ha.heartbeat.SoTimeout

Specifies the UDP socket timeout value.


8.4 Configuring SNMP

Oracle WebLogic Communication Services includes a dedicated SNMP MIB to monitor activity on engine tier and SIP data tier server instances. The Oracle WebLogic Communication Services MIB is available on both Managed Servers and the Administration Server of a domain. However, Oracle WebLogic Communication Services engine and SIP data tier traps are generated only by the Managed Server instances that make up each tier. If your Administration Server is not a target for the sipserver custom resource, it will generate only WebLogic Server SNMP traps (for example, when a server in a cluster fails). Administrators should monitor both WebLogic Server and Oracle WebLogic Communication Services traps to evaluate the behavior of the entire domain.

Note:

Oracle WebLogic Communication Services MIB objects are read-only. You cannot modify a Oracle WebLogic Communication Services configuration using SNMP.

8.4.1 Browsing the MIB

The Oracle WebLogic Communication Services MIB file is installed in WLSS_HOME/server/lib/wlss/BEA-WLSS-MIB.asn1. Use an available SNMP management tool or MIB browser to view the contents of this file. See also Section 8.5.2, "Trap Descriptions" for a description of common SNMP traps.

8.4.2 Steps for Configuring SNMP

To enable SNMP monitoring for the entire Oracle WebLogic Communication Services domain, follow these steps:

  1. Login to the Administration Console for the Oracle WebLogic Communication Services domain.

  2. In the left pane, select the Diagnostics > SNMP node.

  3. In the Server SNMP Agents table, click the New button to create a new agent.

    Note:

    Ensure that you create a new Server SNMP agent, rather than a Domain-Scoped agent.
  4. Enter a unique name for the new SNMP agent (for example, "engine1snmp") and click OK.

  5. Select the newly-created SNMP agent from the Server SNMP Agents table.

  6. On the Configuration > General tab:

    1. Select the Enabled check box to enable the agent.

    2. Enter an unused port number in the SNMP UDP Port field.

      Note:

      If you run multiple Managed Server instances on the same machine, each server instance must use a dedicated SNMP agent with a unique SNMP port number.
    3. Click Save.

  7. Repeat the above steps to generate a unique SNMP agent for each server in your deployment (SIP data tier server, engine tier server, and Administration Server).

8.5 Understanding and Responding to SNMP Traps

The following sections describe the Oracle WebLogic Communication Services SNMP traps in more detail. Recovery procedures for responding to individual traps are also included where applicable.

8.5.1 Files for Troubleshooting

The following Oracle WebLogic Communication Services log and configuration files are frequently helpful for troubleshooting problems, and may be required by your technical support contact:

  • $DOMAIN_DIR/config/config.xml

  • $DOMAIN_DIR/config/custom/sipserver.xml

  • $DOMAIN_DIR/servername/*.log (server and message logs)

  • sip.xml (in the /WEB-INF subdirectory of the application)

  • web.xml (in the /WEB-INF subdirectory of the application)

General information that can help the technical support team includes:

  • The specific versions of:

    • Oracle WebLogic Communication Services

    • Java SDK

    • Operating System

  • Thread dumps for hung Oracle WebLogic Communication Services processes

  • Network analyzer logs

8.5.2 Trap Descriptions

Table 8-3 lists the Oracle WebLogic Communication Services SNMP traps and indicates whether the trap is generated by servers in the engine tier or SIP data tier. Each trap is described in the sections that follow.

8.5.2.1 connectionLostToPeer

This trap is generated by an engine tier server instance when it loses its connection to a replica in the SIP data tier. It may indicate a network connection problem between the engine and SIP data tiers, or may be generated with additional traps if a SIP data tier server fails.

8.5.2.1.1 Recovery Procedure

If this trap occurs in isolation from other traps indicating a server failure, it generally indicates a network failure. Verify or repair the network connection between the affected engine tier server and the SIP data tier server.

If the trap is accompanied by additional traps indicating a SIP data tier server failure (for example, dataTierServerStopped), follow the recovery procedures for the associated traps.

8.5.2.2 connectionReestablishedToPeer

This trap is generated by an engine tier server instance when it successfully reconnects to a SIP data tier server after a prior failure (after a connectionLostToPeer trap was generated). Repeated instances of this trap may indicate an intermittent network failure between the engine and SIP data tiers.

8.5.2.2.1 Recovery Procedure

See Section 8.5.2.1, "connectionLostToPeer".

8.5.2.3 dataTierServerStopped

Oracle WebLogic Communication Services SIP data tier nodes generate this alarm when an unrecoverable error occurs in a WebLogic Server instance that is part of the SIP data tier. Note that this trap may be generated by the server that is shutting down, by another replica in the same partition, or in some cases by both servers (network outages can sometimes trigger both servers to generate the same trap).

8.5.2.3.1 Recovery Procedure

See the Recovery Procedure for Section 8.5.2.8, "serverStopped".

8.5.2.4 overloadControlActivated, overloadControlDeactivated

Oracle WebLogic Communication Services engine tier nodes use a configurable throttling mechanism that helps you control the number of new SIP requests that are processed. After a configured overload condition is observed, Oracle WebLogic Communication Services destroys new SIP requests by responding with "503 Service Unavailable" to the caller. The servers continues to destroy new requests until the overload condition is resolved according to a configured threshold control value. This alarm is generated when the throttling mechanism is activated. The throttling behavior should eventually return the server to a non-overloaded state, and further action may be unnecessary.

8.5.2.4.1 Recovery Procedure

Follow this recovery procedure:

  1. Check other servers to see if they are nearly overloaded.

  2. Check to see if the load balancer is correctly balancing load across the application servers, or if it is overloading one or more servers. If additional servers are nearly overloaded, Notify Tier 4 support immediately.

  3. If the issue is limited to one server, notify Tier 4 support within one hour.

8.5.2.4.2 Additional Overload Information

If you set the queue length as an incoming call overload control, you can monitor the length of the queue using the Administration Console. If you specify a session rate control, you cannot monitor the session rate using the Administration Console. (The Administration Console only displays the current number of SIP sessions, not the rate of new sessions generated.)

8.5.2.5 replicaAddedToPartition

Oracle WebLogic Communication Services SIP data tier nodes generate this alarm when a server instance is added to a partition in the SIP data tier.

8.5.2.5.1 Recovery Procedure

This trap is generated during normal startup procedures when SIP data tier servers are booted.

8.5.2.6 replicaRemovedEnginesRegistration

SIP data tier nodes generate this alarm if an engine server client that was not registered (or was removed from the list of registered engines) attempts to communicate with the SIP data tier. This trap is generally followed by a serverStopped trap indicating that the engine tier server was shut down to preserve SIP data tier consistency.

8.5.2.6.1 Recovery Procedure

Restart the engine tier server. Repeated occurrences of this trap may indicate a network problem between the engine tier server and one or more replicas.

8.5.2.7 replicaRemovedFromPartition

Oracle WebLogic Communication Services SIP data tier nodes generate this alarm when a server is removed from the SIP data tier, either as a result of a normal shutdown operation or because of a failure. There must be at least one replica remaining in a partition to generate this trap; if a partition has only a single replica and that replica fails, the trap cannot be generated. In addition, because engine tier nodes determine when a replica has failed, an engine tier node must be running in order for this trap to be generated.

8.5.2.7.1 Recovery Procedure

If this trap is generated as a result of a server instance failure, additional traps will be generated to indicate the exception. See the recovery procedures for traps generated in addition to replicaRemovedFromPartition.

8.5.2.8 serverStopped

This trap indicates that the WebLogic Server instance is now down. This trap applies to both engine tier and SIP data tier server instances, but only when the servers are members of a named WebLogic Server cluster. If this trap is received spontaneously and not as a result of a controlled shutdown, follow the steps below.

8.5.2.8.1 Recovery Procedure

Follow this recovery procedure:

  1. Use the following command to identify the hung process:

    ps –ef | grep java
    

    There should be only one PID for each WebLogic Server instance running on the machine.

  2. After identifying the affected PID, use the following command to kill the process:

    kill -3 [pid]
    
  3. This command generates the actual thread dump. If the process is not immediately killed, repeat the command several times, spaced 5-10 seconds apart, to help diagnose potential deadlock problems, until the process is killed.

  4. Attempt to restart Oracle WebLogic Communication Services immediately.

  5. Make a backup copy of all SIP logs on the affected server to aid in troubleshooting. The location of the logs varies based on the server configuration.

  6. Copy each log to assist Tier 4 support with troubleshooting the problem.

    Note:

    Oracle WebLogic Communication Services logs are truncated according to your system configuration. Make backup logs immediately to avoid losing critical troubleshooting information.
  7. Notify Tier 4 support and include the log files with the trouble ticket.

  8. Monitor the server closely over next 24 hours. If the source of the problem cannot be identified in the log files, there may be a hardware or network issue that will reappear over time.

8.5.2.8.2 Additional Shutdown Information

The Administration Console generates SNMP messages for managed WebLogic Server instances only until the ServerShutDown message is received. Afterwards, no additional messages are generated.

8.5.2.9 sipAppDeployed

Oracle WebLogic Communication Services engine tier nodes generate this alarm when a SIP Servlet is deployed to the container.

8.5.2.9.1 Recovery Procedure

This trap is generated during normal deployment operations and does not indicate an exception.

8.5.2.10 sipAppUndeployed

Oracle WebLogic Communication Services engine tier nodes generate this alarm when a SIP application shuts down, or if a SIP application is undeployed. This generally occurs when Oracle WebLogic Communication Services is shutdown while active requests still exist.

8.5.2.10.1 Recovery Procedure

During normal shutdown procedures this alarm should be filtered out and should not reach operations. If the alarm occurs during the course of normal operations, it indicates that someone has shutdown the application or server unexpectedly, or there is a problem with the application. Notify Tier 4 support immediately.

8.5.2.11 sipAppFailedToDeploy

Oracle WebLogic Communication Services engine tier nodes generate this trap when an application deploys successfully as a Web Application but fails to deploy as a SIP application.

8.5.2.11.1 Recovery Procedure

The typical failure is caused by an invalid sip.xml configuration file and should occur only during software installation or upgrade procedures. When it occurs, undeploy the application, validate the sip.xml file, and retry the deployment.

Note:

This alarm should never occur during normal operations. If it does, contact Tier 4 support immediately.

8.6 Using the WebLogic Diagnostics Framework (WLDF)

The WebLogic Diagnostic Framework (WLDF) consists of a number of components that work together to collect, archive, and access diagnostic information about a WebLogic Server instance and its applications. Oracle WebLogic Communication Services version integrates with several components of the WLDF in order to monitor and diagnose the operation of engine and SIP data tier nodes, as well as deployed SIP Servlets:

  • Data Collectors—Oracle WebLogic Communication Services integrates with the Harvester service to collect information from runtime MBeans, and with the Logger service to archive SIP requests and responses.

  • Watches and Notifications—Administrators can use the Watches and Notifications component to create complex rules, based on Oracle WebLogic Communication Services runtime MBean attributes, that trigger automatic notifications using JMS, JMX, SNMP, SMTP, and so forth.

  • Image Capture—Oracle WebLogic Communication Services instances can collect certain diagnostic data and write the data to an image file when requested by an Administrator. This data can then be used to diagnose problems in a running server.

  • Instrumentation—Oracle WebLogic Communication Services instruments the server and application code with monitors to help you configure diagnostic actions that are performed on SIP messages (requests and responses) that match certain criteria.

The sections that follow provide more details about how Oracle WebLogic Communication Services integrates with each of the above WLDF components.

8.6.1 Data Collection and Logging

Oracle WebLogic Communication Services uses the WLDF Harvester service to collect data from the attributes of these runtime MBeans:

  • ReplicaRuntimeMBean

  • SipApplicationRuntimeMBean

  • SipServerRuntimeMBean

You can add charts and graphs of this data to your own custom views using the WLDF console extension. To do so, first enable the WLDF console extension by copying the JAR file into the console-ext subdirectory of your domain directory:

cp ~/bea/wlserver_10.3/server/lib/console-ext/diagnostics-console-extension.jar ~/bea/user_projects/domains/mydomain/console-ext

When accessing the WLDF console extension, the Oracle WebLogic Communication Services runtime MBean attributes are available in the Metrics tab of the extension.

Oracle WebLogic Communication Services also uses the WLDF Logger service to archive SIP and Diameter messages to independent, dedicated log files (by default, domain_home/logs/server_name/sipMessages.log). You can configure the name and location of the log file, as well as log rotation policies, using the Configuration > Message Debug tab in the SIP Server Administration Console extension. Note that a server restart is necessary in order to initiate independent logging and log rotation.

8.6.2 Watches and Notifications

The data collected from Oracle WebLogic Communication Services runtime MBeans can be used to create automated monitors, or "watches," that observe a server's diagnostic state. One or more notifications can then be configured for use by a watch, in order to generate a message using SMTP, SNMP, JMX, or JMS when your configured watch conditions and rules occur.

To use watches and notifications, you select the Diagnostics > Diagnostic Modules node in the left pane of the Administration Console and create a new module with the watch rules and notifications required for monitoring your servers. The watch rules can use the metrics collected from Oracle WebLogic Communication Services runtime MBeans, messages written to the log file, or events generated by the diagnostic framework.

8.6.3 Image Capture

Oracle WebLogic Communication Services adds its own image capture information to the diagnostic image generated by the WLDF. You can generate diagnostic images either on demand, or automatically by configuring watch rules.

The information contained in diagnostic images is intended for use by Oracle technical support personnel when troubleshooting a potential server problem and includes:

  • SIP data tier partition and replica configuration

  • Call state and timer statistics

  • Work manager statistics

8.6.4 Instrumentation

The WLDF instrumentation system creates diagnostic monitors and inserts them into Oracle WebLogic Communication Services or application code at specific points in the flow of execution. Oracle WebLogic Communication Services integrates with the instrumentation service to provide a built-in DyeInjection monitor. When enabled, this monitor injects dye flags into the diagnostic context when certain SIP messages enter or exist the system. Dye flags are injected based on the monitor's configuration properties, and on certain request attributes.

Oracle WebLogic Communication Services adds the dye flags described in Table 8-4 below, as well as the WebLogic Server dye flags USER and ADDR. See Oracle Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server for more information.

Table 8-4 Oracle WebLogic Communication Services DyeInjection Flags

Dye Flag Description

PROTOCOL_SIP

Set in the diagnostic context of all SIP protocol messages.

SIP_REQ

Set in the diagnostic context for all SIP requests that match the value of the property SIP_REQ.

SIP_RES

Sset if the SIP response matches the value of property SIP_RES.

SIP_REQURI

Set if the SIP request's reqURI matches the value of property SIP_REQURI.

SIP_ANY_HEADER

Set if the SIP request contains a header that matches the value of the property SIP_ANY_HEADER.

SIP_RES

This flag is set in the diagnostic context for all SIP responses that match the value of the property SIP_RES.

SIP_REQURI

This flag is set if a SIP request's request URI matches the value of property SIP_REQURI.

SIP_ANY_HEADER

This flag is set if a SIP request contains a header matching the value of the property SIP_ANY_HEADER. The value of SIP_ANY_HEADER is specified using the format messageType.headerName=headerValue where headerValue is either a value or regular expression. For example, you can specify the property as SIP_ANY_HEADER=request.Contact=sip:sipp@localhost:5061 or SIP_ANY_HEADER=response.Contact=sip:findme@172.17.30.50:5060.


Dye flags can be applied to both incoming and outbound SIP messages. The flags are useful for dye filtering, and can be used by delegating monitors to trigger further diagnostic actions.

Oracle WebLogic Communication Services provides several delegating monitors that can be applied at the application and server scope, and which may examine dye flags set by the DyeInjection monitor. The delegating monitors are described in Table 8-4.

Table 8-5 Oracle WebLogic Communication Services Diagnostic Monitors

Monitor Name Monitor Type Scope Pointcuts

occas/Sip_Servlet_Before_Service

Before

Application

At entry of SipServlet.do* or SipServlet.service methods of all implementing subclasses.

occas/Sip_Servlet_After_Service

After

Application

At exit of SipServlet.do* or SipServlet.service methods of all implementing subclasses.

occas/Sip_Servlet_Around_Service

Around

Application

At entry and exit of SipServlet.do* or SipServlet.service methods of all implementing subclasses.

occas/Sip_Servlet_Before_Session

Before

Application

At entry of getAttribute, set, remove, and invalidate methods for both SipSession and SipApplicationSession.

occas/Sip_Servlet_After_Session

After

Application

At exit of getAttribute, set, remove, and invalidate methods for both SipSession and SipApplicationSession.

occas/Sip_Servlet_Around_Session

Around

Application

At entry and exit of getAttribute, set, remove, and invalidate methods for both SipSession and SipApplicationSession.

occas/SipSessionDebug

Around

Application

This is a built-in, application-scoped monitor having fixed pointcuts and a fixed debug action. Before and after a pointcut, the monitor performs the SipSessionDebug diagnostic action, which calculates the size of the SIP session after serializing the underlying object.

The pointcuts for this monitor are as follows:

  1. Before and after calls to getSession and getApplicationSession of the SipServletMessage class hierarchy.

  2. Before and after calls to getAttribute, setAttribute, and removeAttribute methods in the SipSession and SipApplicationSession classes.

Note: The occas/SessionDebugAction-Before event is not triggered for the req.getSession() or req.getApplicationSession() joinpoints. Only the occas/SessionDebugAction-After is triggered, because the Session is made available for inspection only after the joinpoints have executed.

Note: If you compile your application using Apache Ant, you must enable the debug attribute to embed necessary debug information into the generated class files.

occas/Sip_Servlet_Before_Message_Send_Internal

Before

Server

At entry of Oracle WebLogic Communication Services code that writes messages to the wire.

occas/Sip_Servlet_After_Message_Send_Internal

After

Server

At exit of Oracle WebLogic Communication Services code that writes messages to the wire.

occas/Sip_Servlet_Around_Message_Send_Internal

Around

Server

At entry and exit of Oracle WebLogic Communication Services code that writes messages to the wire.


8.6.4.1 Configuring Server-Scoped Monitors

To use the server-scoped monitors, you must create a new diagnostic module and create and configure one or more monitors in the module. For the built-in DyeInjection monitor, you then add monitor properties to define the specific dye flags. For delegating monitors such as occas/Sip_Servlet_Before_Message_Send_Internal, you add monitor properties to define diagnostic actions.

Follow these steps to configure the Oracle WebLogic Communication Services DyeInjection monitor, a delegate monitor, and enable dye filtering:

  1. Access the Administration Console for you domain.

  2. Select the Diagnostics > Diagnostic Modules node in the left pane of the console.

  3. Click New to create a new Diagnostic Module. Give the module a descriptive name, such as "instrumentationModule," and click OK.

  4. Select the new "instrumentationModule" from the list of modules in the table.

  5. Select the Targets tab.

  6. Select a server on which to target the module and click Save.

  7. Return to the Diagnostics > Diagnostic Modules node and select instrumentationModule from the list of modules.

  8. Select the Configuration > Instrumentation tab.

  9. Select Enabled to enable instrumentation at the server level, then click Save.

  10. Add the DyeInjection monitor to the module:

    1. Click Add/Remove.

    2. Select the name of a monitor from the Available list (for example, DyeInjection), and use the arrows to move it to the Chosen list.

    3. Click OK.

    4. Select the newly-created monitor from the list of available monitors.

    5. Ensure that the monitor is enabled, and edit the Properties field to add any required properties. For the DyeInjection monitor, sample properties include:

      SIP_RES=180
      SIP_REQ=INVITE
      SIP_ANY_HEADER=request.Contact=sip:sipp@localhost:5061
      
    6. Click Save

  11. Add one or more delegate monitors to the module:

    1. Return to the Configuration > Instrumentation tab for the new module.

    2. Click Add/Remove.

    3. Select the name of a delegate monitor from the Available list (for example, occas/Sip_Servlet_Before_Message_Send_Internal), and use the arrows to move it to the Chosen list.

    4. Click OK.

    5. Select the newly-created monitor from the list of available monitors.

    6. Ensure that the monitor is enabled, then select one or more Actions from the available list, and use the arrows to move the actions to the Chosen list. For the occas/Sip_Servlet_Before_Message_Send_Internal monitor, sample actions include DisplayArgumentsAction, StackDumpAction, ThreadDumpAction, and TraceAction.

    7. Select the check box to EnableDyeFiltering.

    8. Select one or more Dye Masks, such as SIP_REQ, from the Available list and use the arrows to move them to the Chosen list.

    9. Click Save

      Note:

      You can repeat the above steps to create additional delegate monitors.

8.6.4.2 Configuring Application-Scoped Monitors

You configure application-scoped monitors in an XML configuration file named weblogic-diagnostics.xml. You must store the weblogic-diagnostics.xml file in the SIP module's or enterprise application's META-INF directory.

The XML file enables instrumentation at the application level, defines point cuts, and also defines delegate monitor dye masks and actions. Example 8-1 shows a sample configuration file that uses the occas/Sip_Servlet_Before_Service monitor.

Example 8-1 Sample weblogic-diagnostics.xml File

<wldf-resource xmlns="http://www.bea.com/ns/weblogic/90/diagnostics">
  <instrumentation>
    <enabled>true</enabled>
    <include>demo.ProxyServlet</include>
    <wldf-instrumentation-monitor>
      <name>occas/Sip_Servlet_Before_Service</name>
      <enabled>true</enabled>
      <dye-mask>SIP_ANY_HEADER</dye-mask>
      <dye-filtering-enabled>true</dye-filtering-enabled>
      <action>DisplayArgumentsAction</action>
    </wldf-instrumentation-monitor>  
   </instrumentation>
</wldf-resource>

In this example, if an incoming request's diagnostic context contains the SIP_ANY_HEADER dye flag, then the occas/Sip_Servlet_Before_Service monitor is triggered and the DisplayArgumentsAction is executed.

8.7 Logging SIP Requests and Responses

Oracle WebLogic Communication Services enables you to perform Protocol Data Unit (PDU) logging for the SIP requests and responses it processes. Logged SIP messages are placed either in the domain-wide log file for Oracle WebLogic Communication Services, or in the log files for individual Managed Server instances. Because SIP messages share the same log files as Oracle WebLogic Communication Services instances, you can use advanced server logging features such as log rotation, domain log filtering, and maximum log size configuration when managing logged SIP messages.

Administrators configure SIP PDU logging by defining one or more SIP Servlets using the com.bea.wcp.sip.engine.tracing.listener.TraceMessageListenerImpl class. Logging criteria are then configured either as parameters to the defined servlet, or in separate XML files packaged with the application.

As SIP requests are processed or SIP responses generated, the logging Servlet compares the message with the filtering patterns defined in a standalone XML configuration file or Servlet parameter. SIP requests and responses that match the specified pattern are written to the log file along with the name of the logging servlet, the configured logging level, and other details. To avoid unnecessary pattern matching, the Servlet marks new SIP Sessions when an initial pattern is matched and then logs subsequent requests and responses for that session automatically.

Logging criteria are defined either directly in sip.xml as parameters to a logging Servlet, or in external XML configuration files. See Section 8.7.3, "Specifying the Criteria for Logging Messages".

Note:

Engineers can implement PDU logging functionality in their Servlets either by creating a delegate with the TraceMessageListenerFactory in the Servlet's init() method, or by using the tracing class in deployed Java applications. Using the delegate enables you to perform custom logging or manipulate incoming SIP messages using the default trace message listener implementation. See Section 8.7.7, "Adding Tracing Functionality to SIP Servlet Code" for an example of using the factory in a Servlet's init() method.

8.7.1 Defining Logging Servlets in sip.xml

Logging Servlets for SIP messages are created by defining Servlets having the implementation class com.bea.wcp.sip.engine.tracing.listener.TraceMessageListenerImpl. The definition for a sample msgTraceLogger is shown in Example 8-2.

Example 8-2 Sample Logging Servlet

<servlet>
    <servlet-name>msgTraceLogger</servlet-name>
    <servlet-class>com.bea.wcp.sip.engine.tracing.listener.TraceMessageListenerImpl</servlet-class>
    <init-param>
      <param-name>domain</param-name>
      <param-value>true</param-value>
    </init-param>
    <init-param>
      <param-name>level</param-name>
      <param-value>full</param-value>
    </init-param>
    <load-on-startup/>
  </servlet>

8.7.2 Configuring the Logging Level and Destination

Logging attributes such as the level of logging detail and the destination log file for SIP messages are passed as initialization parameters to the logging Servlet. Table 8-5 lists the parameters and parameter values that you can specify as init-param entries. Example 8-2 shows the sample init-param entries for a Servlet that logs full SIP message information to the domain log file.

8.7.3 Specifying the Criteria for Logging Messages

The criteria for selecting SIP messages to log can be defined either in XML files that are packaged with the logging Servlet's application, or as initialization parameters in the Servlet's sip.xml deployment descriptor. The sections that follow describe each method.

8.7.3.1 Using XML Documents to Specify Logging Criteria

If you do not specify logging criteria as an initialization parameter to the logging Servlet, the Servlet looks for logging criteria in a pair of XML descriptor files in the top level of the logging application. These descriptor files, named request-pattern.xml and response-pattern.xml, define patterns that Oracle WebLogic Communication Services uses for selecting SIP requests and responses to place in the log file.

Note:

By default Oracle WebLogic Communication Services logs both requests and responses. If you do not want to log responses, you must define a response-pattern.xml file with empty matching criteria.

A typical pattern definition defines a condition for matching a particular value in a SIP message header. For example, the sample response-pattern.xml used by the msgTraceLogger Servlet matches all MESSAGE requests. The contents of this descriptor are shown in

Example 8-3 Sample response-pattern.xml for msgTraceLogger Servlet

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pattern
   PUBLIC "Registration//Organization//Type Label//Definition Language"
   "trace-pattern.dtd">
<pattern>
  <equal>
    <var>response.method</var>
    <value>MESSAGE</value>
  </equal>
</pattern>

Additional operators and conditions for matching SIP messages are described in Section 8.7.6, "trace-pattern.dtd Reference". Most conditions, such as the equal condition shown in Example 8-3, require a variable (var element) that identifies the portion of the SIP message to evaluate. Table 8-5 lists some common variables and sample values. For additional variable names and examples, see Section 16: Mapping Requests to Servlets in the SIP Servlet API 1.1 specification (http://jcp.org/en/jsr/detail?id=289); Oracle WebLogic Communication Services enables mapping of both request and response variables to logging Servlets.

Table 8-6 Pattern-matching Variables and Sample Values

Variable Sample Values

request.method, response.method

MESSAGE, INVITE, ACK, BYE, CANCEL

request.uri.user, response.uri.user

guest, admin, joe

request.to.host, response.to.host

server.mydomain.com


Both request-pattern.xml and response-pattern.xml use the same Document Type Definition (DTD). See Section 8.7.6, "trace-pattern.dtd Reference" for more information.

8.7.3.2 Using Servlet Parameters to Specify Logging Criteria

Pattern-matching criteria can also be specified as initialization parameters to the logging Servlet, rather than as separate XML documents. The parameter names used to specify matching criteria are request-pattern-string and response-pattern-string. They are defined along with the logging level and destination as described in Section 8.7.2, "Configuring the Logging Level and Destination".

The value of each pattern-matching parameter must consist of a valid XML document that adheres to the DTD for standalone pattern definition documents (see Section 8.7.3.1, "Using XML Documents to Specify Logging Criteria"). Because the XML documents that define the patterns and values must not be parsed as part of the sip.xml descriptor, you must enclose the contents within the CDATA tag. Example 8-4 shows the full sip.xml entry for the sample logging Servlet, invTraceLogger. The final two init-param elements specify that the Servlet log only INVITE request methods and OPTIONS response methods.

Example 8-4 Logging Criteria Specified as init-param Elements

<servlet>
      <servlet-name>invTraceLogger</servlet-name>
      <servlet-class>com.bea.wcp.sip.engine.tracing.listener.TraceMessageListenerImpl</servlet-class>
      <init-param>
        <param-name>domain</param-name>
        <param-value>true</param-value>
      </init-param>
      <init-param>
        <param-name>level</param-name>
        <param-value>full</param-value>
      </init-param>
      <init-param>
        <param-name>request-pattern-string</param-name>
        <param-value>
            <![CDATA[
                <?xml version="1.0" encoding="UTF-8"?>
                <!DOCTYPE pattern
                   PUBLIC "Registration//Organization//Type Label//Definition Language"
                   "trace-pattern.dtd">
                <pattern>
                  <equal>
                    <var>request.method</var>
                    <value>INVITE</value>
                  </equal>
                </pattern>
            ]]>
        </param-value>
      </init-param>
      <init-param>
        <param-name>response-pattern-string</param-name>
        <param-value>
            <![CDATA[
                <?xml version="1.0" encoding="UTF-8"?>
                <!DOCTYPE pattern
                   PUBLIC "Registration//Organization//Type Label//Definition Language"
                   "trace-pattern.dtd">
                <pattern>
                  <equal>
                    <var>response.method</var>
                    <value>OPTIONS</value>
                  </equal>
                </pattern>
            ]]>
        </param-value>
      </init-param>
      <load-on-startup/>
  </servlet>

8.7.4 Specifying Content Types for Unencrypted Logging

By default Oracle WebLogic Communication Services uses String format (UTF-8 encoding) to log the content of SIP messages having a text or application/sdp Content-Type value. For all other Content-Type values, Oracle WebLogic Communication Services attempts to log the message content using the character set specified in the charset parameter of the message, if one is specified. If no charset parameter is specified, or if the charset value is invalid or unsupported, Oracle WebLogic Communication Services uses Base-64 encoding to encrypt the message content before logging the message.

If you want to avoid encrypting the content of messages under these circumstances, specify a list of String-representable Content-Type values using the string-rep element in sipserver.xml. The string-rep element can contain one or more content-type elements to match. If a logged message matches one of the configured content-type elements, Oracle WebLogic Communication Services logs the content in String format using UTF-8 encoding, regardless of whether or not a charset parameter is included.

Note:

You do not need to specify text/* or application/sdp content types as these are logged in String format by default.

Example 8-5 shows a sample message-debug configuration that logs String content for three additional Content-Type values, in addition to text/* and application/sdp content.

Example 8-5 Logging String Content for Additional Content Types

   <message-debug>
     <level>full</level>
     <string-rep>
       <content-type>application/msml+xml</content-type>
       <content-type>application/media_control+xml</content-type>
       <content-type>application/media_control</content-type>
     </string-rep>
   </message-debug>

8.7.5 Enabling Log Rotation and Viewing Log Files

The Oracle WebLogic Communication Services logging infrastructure enables you to automatically write to a new log file when the existing log file reaches a specified size. You can also view log contents using the Administration Console or configure additional server-level events that are written to the log.

8.7.6 trace-pattern.dtd Reference

trace-pattern.dtd defines the required contents of the request-pattern.xml and response-pattern.xml, documents, as well as the values for the request-pattern-string and response-pattern-string Servlet init-param variables.

Example 8-6 trace-pattern.dtd

<!--
The different types of conditions supported.
- > 

<!ENTITY % condition "and | or | not |
                      equal | contains | exists | subdomain-of">

<!--
A pattern is a condition: a predicate over the set of SIP requests.
- > 

<!ELEMENT pattern (%condition;)>

<!--
An "and" condition is true if and only if all its constituent conditions
are true.
- > 

<!ELEMENT and (%condition;)+>

<!--
An "or" condition is true if at least one of its constituent conditions
is true.
- > 

<!ELEMENT or (%condition;)+>

<!--
Negates the value of the contained condition.
- > 

<!ELEMENT not (%condition;)>

<!--
True if the value of the variable equals the specified literal value.
- > 

<!ELEMENT equal (var, value)>

<!--
True if the value of the variable contains the specified literal value.
- > 

<!ELEMENT contains (var, value)>

<!--
True if the specified variable exists.
- > 

<!ELEMENT exists (var)>

<!--
- > 

<!ELEMENT subdomain-of (var, value)>

<!--
Specifies a variable. Example:
  <var>request.uri.user</var>
- > 

<!ELEMENT var (#PCDATA)>

<!--
Specifies a literal string value that is used to specify rules.
- > 

<!ELEMENT value (#PCDATA)>

<!--
Specifies whether the "equal" test is case sensitive or not.
- > 

<!ATTLIST equal ignore-case (true|false) "false">

<!--
Specifies whether the "contains" test is case sensitive or not.
- > 

<!ATTLIST contains ignore-case (true|false) "false">

<!--
The ID mechanism is to allow tools to easily make tool-specific
references to the elements of the deployment descriptor. This allows
tools that produce additional deployment information (i.e information
beyond the standard deployment descriptor information) to store the
non-standard information in a separate file, and easily refer from
these tools-specific files to the information in the standard sip-app
deployment descriptor.
- > 

<!ATTLIST pattern id ID #IMPLIED>
<!ATTLIST and id ID #IMPLIED>
<!ATTLIST or id ID #IMPLIED>
<!ATTLIST not id ID #IMPLIED>
<!ATTLIST equal id ID #IMPLIED>
<!ATTLIST contains id ID #IMPLIED>
<!ATTLIST exists id ID #IMPLIED>
<!ATTLIST subdomain-of id ID #IMPLIED>
<!ATTLIST var id ID #IMPLIED>
<!ATTLIST value id ID #IMPLIED>

8.7.7 Adding Tracing Functionality to SIP Servlet Code

Tracing functionality can be added to your own Servlets or to Java code by using the TraceMessageListenerFactory. TraceMessageListenerFactory enables clients to reuse the default trace message listener implementation behaviors by creating an instance and then delegating to it. The factory implementation instance can be found in the servlet context for SIP Servlets by looking up the value of the TraceMessageListenerFactory.TRACE_MESSAGE_LISTENER_FACTORY attribute.

Note:

Instances created by the factory are not registered with Oracle WebLogic Communication Services to receive callbacks upon SIP message arrival and departure.

To implement tracing in a Servlet, you use the factory class to create a delegate in the Servlet's init() method as shown in Example 8-7.

Example 8-7 Using the TraceMessageListenerFactory

public final class TraceMessageListenerImpl extends SipServlet implements MessageListener {
  private MessageListener delegate;

  public void init() throws ServletException {
    ServletContext sc = (ServletContext) getServletContext();
    TraceMessageListenerFactory factory = (TraceMessageListenerFactory) sc.getAttribute(TraceMessageListenerFactory.TRACE_MESSAGE_LISTENER_FACTORY);
    delegate = factory.createTraceMessageListener(getServletConfig());
  }
  public final void onRequest(SipServletRequest req, boolean incoming) {
    delegate.onRequest(req,incoming);
  }
  public final void onResponse(SipServletResponse resp, boolean incoming) {
    delegate.onResponse(resp,incoming);
  }
}

8.7.8 Order of Startup for Listeners and Logging Servlets

If you deploy both listeners and logging servlets, the listener classes are loaded first, followed by the Servlets. Logging Servlets are deployed in order according to the load order specified in their Web Application deployment descriptor.

8.8 Tuning JVM Garbage Collection for Production Deployments

Production installations of Oracle WebLogic Communication Services generally require extremely small response times (under 50 milliseconds) for clients at all times, even under peak server loads. A key factor in maintaining brief response times is the proper selection and tuning of the JVM's Garbage Collection (GC) algorithm for Oracle WebLogic Communication Services instances in the engine tier.

Whereas certain tuning strategies are designed to yield the lowest average garbage collection times or to minimize the frequency of full GCs, those strategies can sometimes result in one or more very long periods of garbage collection (often several seconds long) that are offset by shorter GC intervals. With a production SIP Server installation, all long GC intervals must be avoided in order to maintain response time goals.

The sections that follow describe GC tuning strategies for JRockit and Sun's JVM that generally result in best response time performance.

Note:

For more information on JRockit, see Oracle Fusion Middleware Introduction to Oracle WebLogic Server.

8.8.1 Modifying JVM Parameters in Server Start Scripts

If you use custom startup scripts to start Oracle WebLogic Communication Services engines and replicas, simply edit those scripts to include the recommended JVM options described in the sections that follow.

The Configuration Wizard also installs default startup scripts when you configure a new domain. These scripts are installed in the MIDDLEWARE_HOME/user_projects/domains/domain_name/bin directory by default, and include:

  • startWebLogic.cmd, startWebLogic.sh—These scripts start the Administration Server for the domain.

  • startManagedWebLogic.cmd, startManagedWebLogic.sh—These scripts start managed engines and replicas in the domain.

If you use the Oracle-installed scripts to start engines and replicas, you can override JVM memory arguments by first setting the USER_MEM_ARGS environment variable in your command shell.

Note:

Setting the USER_MEM_ARGS environment variable overrides all default JVM memory arguments specified in the Oracle-installed scripts. Always set USER_MEM_ARGS to the full list of JVM memory arguments you intend to use. For example, when using the Sun JVM, always add -XX:MaxPermSize=128m to the USER_MEM_ARGS value, even if you only intend to change the default heap space (-Xms, -Xmx) parameters.

8.8.2 Tuning Garbage Collection with JRockit

JRockit provides several monitoring tools that you can use to analyze the JVM heap at any given moment, including:

  • JRockit Runtime Analyzer—provides a view into the runtime behavior of garbage collection and pause times.

  • JRockit Stack Dumps—reveals applications' thread activity to help you troubleshoot and/or improve performance.

Use these and other tools in a controlled environment to determine the effects of JVM settings before you use the settings in a production deployment.

The following sections describe suggested starting JVM options for use with the JRockit. If you use JRockit with the deterministic garbage collector (recommended), use the options described in Section 8.8.3, "Using Oracle JRockit Real Time (Deterministic Garbage Collection)".

8.8.3 Using Oracle JRockit Real Time (Deterministic Garbage Collection)

Very short response times are most easily achieved by using JRockit Real Time, which implements a deterministic garbage collector.

Oracle recommends using the following JVM arguments for engine tier servers in replicated cluster configurations:

-Xms1024m -Xmx1024m -XgcPrio:deterministic -XpauseTarget=30ms -XXtlasize:min=8k -XXnosystemgc

Note:

The above settings are configured by default in the $WLSS_HOME/common/bin/wlssCommenv.sh file when you use the Configuration Wizard to create a new domain with the JRockit JVM.

You may need to increase the -XpauseTarget value for allocation-intensive applications. The value can be decreased for smaller applications under light loads.

Adjust the heap size according to the amount of live data used by deployed applications. As a starting point, set the heap size from 2 to 3 times the amount required by your applications. A value closer to 3 times the required amount generally yields the best performance.

For replica servers, increase the available memory:

-Xms3072m -Xmx3072m -XgcPrio:deterministic -XpauseTarget=30ms -XXtlasize:min=8k -XXnosystemgc

These settings fix the heap size and enable the dynamic garbage collector with deterministic garbage collection. -XpauseTarget sets the maximum pause time and -XXtlasize=3k sets the thread-local area size. -XXnosystemgc prevents System.gc() application calls from forcing garbage collection.

8.8.4 Using Oracle JRockit without Deterministic Garbage Collection

When using Oracle's JRockit JVM without deterministic garbage collection (not recommended for production deployments), the best response time performance is obtained by using the generational concurrent garbage collector.

The full list of example startup options for an engine tier server are:

-Xms1024m -Xmx1024m -Xgc:gencon -XXnosystemgc -XXtlasize:min=3k -XXkeeparearatio=0 -Xns:48m 

Note:

Fine tune the heap size according to the amount of live data used by deployed applications.

The full list of example startup options for a replica server are:

-Xms3072m -Xmx3072m -Xgc:gencon -XXnosystemgc -XXtlasize:min=3k -XXkeeparearatio=0 -Xns:48m

8.8.5 Tuning Garbage Collection with Sun JDK

When using Sun's JDK, the goal in tuning garbage collection performance is to reduce the time required to perform a full garbage collection cycle. You should not attempt to tune the JVM to minimize the frequency of full garbage collections, because this generally results in an eventual forced garbage collection cycle that may take up to several full seconds to complete.

The simplest and most reliable way to achieve short garbage collection times over the lifetime of a production server is to use a fixed heap size with the default collector and the parallel young generation collector, restricting the new generation size to at most one third of the overall heap.

The following example JVM settings are recommended for most engine tier servers:

-server -Xmx1024m -XX:MaxPermSize=128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=256 -XX:CMSInitiatingOccupancyFraction=60 -XX:+DisableExplicitGC

For replica servers, use the example settings:

-server -Xmx3072m -XX:MaxPermSize=128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=256 -XX:CMSInitiatingOccupancyFraction=60 -XX:+DisableExplicitGC

The above options have the following effect:

  • -XX:+UseTLAB—Uses thread-local object allocation blocks. This improves concurrency by reducing contention on the shared heap lock.

  • -XX:+UseParNewGC—Uses a parallel version of the young generation copying collector alongside the concurrent mark-and-sweep collector. This minimizes pauses by using all available CPUs in parallel. The collector is compatible with both the default collector and the Concurrent Mark and Sweep (CMS) collector.

  • -Xms, -Xmx—Places boundaries on the heap size to increase the predictability of garbage collection. The heap size is limited in replica servers so that even Full GCs do not trigger SIP retransmissions. -Xms sets the starting size to prevent pauses caused by heap expansion.

  • -XX:MaxTenuringThreshold=0—Makes the full NewSize available to every NewGC cycle, and reduces the pause time by not evaluating tenured objects. Technically, this setting promotes all live objects to the older generation, rather than copying them.

  • -XX:SurvivorRatio=128—Specifies a high survivor ratio, which goes along with the zero tenuring threshold to ensure that little space is reserved for absent survivors.

8.9 Avoiding JVM Delays Caused By Random Number Generation

The library used for random number generation in Sun's JVM relies on /dev/random by default for UNIX platforms. This can potentially block the Oracle WebLogic Communication Services process because on some operating systems /dev/random waits for a certain amount of "noise" to be generated on the host machine before returning a result. Although /dev/random is more secure, Oracle recommends using /dev/urandom if the default JVM configuration delays Oracle WebLogic Communication Services startup.

To determine if your operating system exhibits this behavior, try displaying a portion of the file from a shell prompt:

head -n 1 /dev/random
  1. Open the $JAVA_HOME/jre/lib/security/java.security file in a text editor.

  2. Change the line:

    securerandom.source=file:/dev/random
    

    to read:

    securerandom.source=file:/dev/urandom
    
  3. Save your change and exit the text editor.