System Backup and Restoration Guide

     Previous  Next    Open TOC in new window    View as PDF - New Window  Get Adobe Reader - New Window
Content starts here


A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, or unexpected application behavior may each contribute to the failure of a server instance.

Oracle Communications Services Gatekeeper uses a highly clustered architecture as the basis for minimizing the impact of failure events. However, even in a clustered environment it is important to prepare for a sound recovery process in the event that an individual server or server machine fails.

This chapter summarizes Oracle Communications Services Gatekeeper failure prevention and recovery features, and describes the configuration artifacts that are required in order to restore different portions of a Oracle Communications Services Gatekeeper domain. The remaining sections in this guide describe how to back up Oracle Communications Services Gatekeeper configuration artifacts, and how to use those artifacts to restore the system in the event of a server failure.


Failure Prevention and Automatic Recovery Features

Oracle Communications Services Gatekeeper, and the underlying WebLogic Server platform, provide many features that protect against server failures. In a production system, all available features should be used in order to ensure uninterrupted service.

Overload Alarms

Oracle Communications Services Gatekeeper’s underlying WebLogic Server platform detects increases in system load that can affect deployed application performance and stability. WebLogic Server also allows administrators to configure failure prevention actions that occur automatically at predefined load thresholds. Automatic overload protection helps you avoid failures that result from unanticipated levels of application traffic or resource utilization as indicated by:

See Oracle WebLogic Server Configuring Server Environments at for more information on avoiding and managing overload.

Note: Each backwards compatible communication service in Oracle Communications Services Gatekeeper uses a pair of attributes, OverloadPercentage and SevereOverloadPercentage, that define the amount of load on the software module required to trigger an overload alarm. Always monitor for these alarms, and perform system throttling as necessary to avoid failures that could result from unanticipated levels of application traffic or resource utilization. See the System Administrator’s Guide for more information on these communication services.

Redundancy and Failover for Clustered Services

Using multiple Access tier and Network tier servers in dedicated clusters increases the reliability and availability of your applications. Access tier clusters maintain no stateful information about applications, so the failure of a server does not result in any data loss. Oracle Communications Services Gatekeeper also performs automated failover for servers within the Network tier. Any production installation must use the tiered configuration to protect against individual server failures. See Concepts and Architectural Overview for more information.

Automatic Restart for Managed Servers

WebLogic Server self-health monitoring features improve the reliability and availability of server instances in a domain. Selected subsystems within each server instance monitor their health status based on criteria specific to the subsystem. (For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics.) If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as “failed” with the host server.

Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.

When used in combination with Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator. For more information on how to use Node Manager to control servers, see Oracle WebLogic Server Node Manager Administrator’s Guide at

Managed Server Independence Mode

Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it tries to contact the Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally-cached configuration information—this is the configuration that was current at the time of the Managed Server’s most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. See Oracle WebLogic Server Administration Console on-line help at for information on replicating domain config files for Managed Server independence in the WebLogic Server documentation.

Automatic Migration of Failed Managed Servers

When using Linux or UNIX operating systems, you can use WebLogic Server’s server migration feature to automatically start a candidate (backup) server if a Network tier server’s machine fails or becomes partitioned from the network. The server migration feature uses node manager, in conjunction with the script, to automatically boot candidate servers using a floating IP address. Candidate servers are booted only if the primary server hosting a Network tier instance becomes unreachable. See Oracle WebLogic Server Using Clusters at for more information about using the server migration feature.

Geographic Redundancy for Regional Site Failures

In addition to server-level redundancy and failover capabilities, Oracle Communications Services Gatekeeper enables you to configure peer sites to protect against catastrophic failures, such as power outages, that can affect an entire domain. This enables you to failover from one geographical site to another, avoiding complete service outages. See Geographic Redundancy in the Architecture Overview for more information.


Overview of Configuration Artifacts

A Oracle Communications Services Gatekeeper deployment utilizes two basic categories of configuration information: domain-level configuration, and database configuration. The domain-level configuration consists of the artifacts used by the underlying WebLogic Server platform to configure the behavior of managed servers, clusters, security, and other resources deployed to clusters and servers within the domain. The primary domain-level configuration artifact is the config.xml file, stored in the domain-home/config directory. The config.xml file generally references additional configuration files beneath the config directory to configure additional domain resources such as JDBC and JMS.

In addition to the basic domain-level configuration of the WebLogic Server platform, Oracle Communications Services Gatekeeper stores some configuration for its core services in the form of database tables. This includes the routing configuration for backward-compatible communication services and PRM integration data. The database tables are shared across clustered instances of Oracle Communications Services Gatekeeper server instances. The database must be backed up at regular intervals to protect against data loss or corruption. An Oracle RAC deployment is also required for production installations, to provide redundancy and failover for the database configuration.

Both domain-level configuration backups and database backups may be required at different times in order to fully restore servers, or migrate server instances to new server hardware, in a Oracle Communications Services Gatekeeper installation.


Common Backup and Restoration Tasks

Maintaining system integrity requires that you make use of existing high availability and failover features, perform regular backups of configuration artifacts, and understand how to restore server instances or migrate servers onto viable hardware. These common tasks are summarized in Table 1-1.

Table 1-1 Common Backup and Restoration Tasks
Enable WebLogic Server platform reliability and recovery features.
Enable Oracle Communications Services Gatekeeper reliability and recovery features.
Backup WebLogic Server domain configuration.
Backup Oracle Communications Services Gatekeeper database configuration.
Restore a failed Access Tier or Network Tier server instance.

  Back to Top       Previous  Next