Operations Guide

     Previous  Next    Open TOC in new window    View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Avoiding and Recovering From Server Failures

A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, or unexpected application behavior may each contribute to the failure of a server instance.

WebLogic SIP Server uses a highly clustered architecture as the basis for minimizing the impact of failure events. However, even in a clustered environment it is important to prepare for a sound recovery process in the event that an engine tier server, data tier server, or Diameter relay node were to suddenly fail.

The following sections provide information and procedures for recovering failed server instances:

 


Failure Prevention and Recovery Features

WebLogic SIP Server provides several features that facilitate recovery from and protection against server failure.

Overload Protection

WebLogic SIP Server detects increases in system load that could affect the performance and stability of deployed SIP Servlets, and automatically throttles message processing at predefined load thresholds.

Using overload protection helps you avoid failures that could result from unanticipated levels of application traffic or resource utilization.

WebLogic SIP Server attempts to avoid failure when certain conditions occur:

See overload in the Configuration Reference for more information.

Redundancy and Failover for Clustered Services

You can increase the reliability and availability of your applications by using multiple engine tier servers in a dedicated cluster, as well as multiple data tier servers (replicas) in a dedicated data tier cluster. Because engine tier clusters maintain no stateful information about applications, the failure of an engine tier server does not result in any data loss or dropped calls. Multiple replicas in a data tier partition store redundant copies of call state information, and automatically failover to one another should a replica fail.

See Overview of the WebLogic SIP Server Architecture in the Configuration Manual for more information.

Automatic Restart for Failed Server Instances

Using Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator.

For more information, see Using Node Manager to Control Servers in the WebLogic Server 9.2 documentation.

Managed Server Independence Mode

Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it contacts its Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally-cached configuration information—this is the configuration that was current at the time of the Managed Server's most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. See Replicating domain config files for Managed Server Independence in the WebLogic Server 9.2 documentation.

 


Directory and File Backups for Failure Recovery

Recovery from the failure of a server instance requires access to the domain's configuration and security data. This section describes file backups that WebLogic SIP Server performs automatically, as well as manual backup procedures that an administrator should perform periodically.

Backing up config.xml

By default, an Administration Server stores a domain's configuration data in a file called domain_name/config.xml, where domain_name is the root directory of the domain.

Back up config.xml to a secure location in case a failure of the Administration Server renders the original copy unavailable. BEA recommends storing each new version of a config.xml file to a source control repository. If an Administration Server fails, you can copy the most recent backup version to a different machine and restart the Administration Server on that machine.

Automated config.xml Archiving

By default, the Administration Server archives up to 5 previous versions of config.xml in the domain-name/configArchive directory.

When you save a change to a domain's configuration, the Administration Server saves the previous configuration in domain-name\configArchive\config.xml#n. Each time the Administration Server saves a file in the configArchive directory, it increments the value of the #n suffix, up to a configurable number of copies—5 by default. Thereafter, each time you change the domain configuration:

To configure how the number of config.xml file versions that the server maintains:

  1. In the left pane of the Administration Console, click on the name of the domain.
  2. In the right pane, click the Configuration->General tab.
  3. In the Advanced Options bar, click Show.
  4. In the Archive Configuration Count box, enter the number of versions to save.
  5. Click Apply.

Automatic Backup of config.xml at Server Startup

In addition to the files in domain-name\configArchive, the Administration Server creates two other files that back up the domain's configuration at key points during the startup process:

Backing Up the sipserver Application

As with the config.xml file, the sipserver implementation application contains configuration information used by all engine and data tier servers deployed within a domain. The sipserver application also generally includes the diameter application for engine tier servers that act as Diameter client nodes.

By default the sipserver application is stored in domain_name/sipserver. Backup the entire application directory, which includes the sipserver.xml, datatier.xml, and diameter.xml configuration files, as well as any additional patches you may have installed.

Backing Up the Diameter Application

If you configure one or more WebLogic SIP Server instances to function as Diameter relay agent nodes, the Diameter Web Application is generally deployed as a standalone application (outside of the sipserver implementation application). Backup each Diameter application used to configure a relay agent node. This generally involves a separate Diameter application directory for each relay.

Backing Up Server Start Scripts

In a WebLogic SIP Server deployment, the start scripts used to boot engine and data tier servers are generally customized to include domain-specific configuration information such as:

Backup each distinct start script used to boot engine tier, data tier, or diameter relay servers in your domain.

Backing Up Logging Servlet Applications

If you use WebLogic SIP Server logging Servlets (see Logging SIP Requests and Responses) to perform regular logging or auditing of SIP messages, backup the complete application source files so that you can easily redeploy the applications should the staging server fail or the original deployment directory becomes corrupted.

Backing Up Security Data

The WebLogic Security service stores its configuration data config.xml file, and also in an LDAP repository and other files.

Backing Up the WebLogic LDAP Repository

The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with WebLogic SIP Server store their data in an LDAP server. Each WebLogic SIP Server contains an embedded LDAP server. The Administration Server contains the master LDAP server, which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:

domain_name\adminServer\ldap

where domain_name is the domain's root directory and adminServer is the directory in which the Administration Server stores runtime and security data.

Each WebLogic SIP Server has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.

The ldap/ldapfiles subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under the ldap directory contain LDAP server message logs and data about replicated LDAP servers.

Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldap directory tree, the backups in the ldapfiles subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available.

Once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP file below the ldap\backup directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.

For information about configuring the LDAP backup, see Configuring Backups for the Embedded LDAP Server in the WebLogic Server 9.2 Documentation.

Backing Up SerializedSystemIni.dat and Security Certificates

All servers create a file named SerializedSystemIni.dat and place it in the server's root directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.

If you configured a server to use SSL, also back up the security certificates and keys. The location of these files is user-configurable.

Backing Up Additional Operating System Configuration Files

Certain files maintained at the operating system level are also critical in helping you recover from system failures. Consider backing up the following information as necessary for your system:

 


Restarting a Failed Administration Server

If no Managed Servers in the domain are running when you restart a failed Administration Server, no special steps are required. Start the Administration Server as you normally would.

If the Administration Server shuts down while Managed Servers continue to run, you do not need to restart the Managed Servers that are already running in order to recover management of the domain. The procedure for recovering management of an active domain depends upon whether you can restart the Administration Server on the same machine it was running on when the domain was started.

Restarting an Administration Server on the Same Machine

If you restart the WebLogic Administration Server while Managed Servers continue to run, by default the Administration Server can discover the presence of the running Managed Servers.

Note: Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers.

The root directory for the domain contains a file, running-managed-servers.xml, which contains a list of the Managed Servers in the domain and describes whether they are running or not. When the Administration Server restarts, it checks this file to determine which Managed Servers were under its control before it stopped running.

When a Managed Server is gracefully or forcefully shut down, its status in running-managed-servers.xml is updated to "not-running". When an Administration Server restarts, it does not try to discover Managed Servers with the "not-running" status. A Managed Server that stops running because of a system crash, or that was stopped by killing the JVM or the command prompt (shell) in which it was running, will still have the status "running' in running-managed-servers.xml. The Administration Server will attempt to discover them, and will throw an exception when it determines that the Managed Server is no longer running.

Restarting the Administration Server does not cause Managed Servers to update the configuration of static attributes. Static attributes are those that a server refers to only during its startup process. Servers instances must be restarted to take account of changes to static configuration attributes. Discovery of the Managed Servers only enables the Administration Server to monitor the Managed Servers or make runtime changes in attributes that can be configured while a server is running (dynamic attributes).

Restarting an Administration Server on Another Machine

If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:

  1. Install the WebLogic SIP Server software on the new administration machine (if this has not already been done).
  2. Make your application files available to the new Administration Server by copying them from backups or by using a shared disk. Your application files should be available in the same relative location on the new file system as on the file system of the original Administration Server.
  3. Make your configuration and security data available to the new administration machine by copying them from backups or by using a shared disk. For more information, refer to Backing Up Security Data.
  4. Restart the Administration Server on the new machine.
  5. Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers.

When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.

 


Restarting Failed Managed Servers

If the Administration Server is reachable by Managed Server that failed, you can:

If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading locally-cached configuration data. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. For a description of MSI mode, and the files that a Managed Server must access to start up in MSI mode, see Replicating domain config files for Managed Server Independence in the WebLogic Server 9.2 documentation.

To start up a Managed Server in MSI mode:

  1. Ensure that the following files are available in the Managed Server's root directory:
    • msi-config.xml.
    • SerializedSystemIni.dat
    • boot.properties
    • If these files are not in the Managed Server's root directory:

    1. Copy the config.xml and SerializedSystemIni.dat file from the Administration Server's root directory (or from a backup) to the Managed Server's root directory.
    2. Rename the configuration file to msi-config.xml. When you start the server, it will use the copied configuration files.
    3. Note: Alternatively, use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files.
  2. Start the Managed Server at the command line or using a script.
  3. The Managed Server will run in MSI mode until it is contacted by its Administration Server. For information about restarting the Administration Server in this scenario, see Restarting a Failed Administration Server.


  Back to Top       Previous  Next