A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, or unexpected application behavior may each contribute to the failure of a server instance.
WebLogic Network Gatekeeper uses a highly clustered architecture as the basis for minimizing the impact of failure events. However, even in a clustered environment it is important to prepare for a sound recovery process in the event that an individual server or server machine fails.
This chapter summarizes WebLogic NetWork Gatekeeper failure prevention and recovery features, and describes the configuration artifacts that are required in order to restore different portions of a WebLogic Network Gatekeeper domain. The remaining sections in this guide describe how to back up WebLogic Network Gatekeeper configuration artifacts, and how to use those artifacts to restore the system in the event of a server failure.
WebLogic Network Gatekeeper, and the underlying WebLogic Server platform, provide many features that protect against server failures. In a production system, all available features should be used in order to ensure uninterrupted service.
Network Gatekeeper’s underlying WebLogic Server platform detects increases in system load that can affect deployed application performance and stability. WebLogic Server also allows administrators to configure failure prevention actions that occur automatically at predefined load thresholds. Automatic overload protection helps you avoid failures that result from unanticipated levels of application traffic or resource utilization as indicated by:
See Avoiding and Managing Overload in the WebLogic Server 10 documentation for more information.
Using multiple Access tier and Network tier servers in dedicated clusters increases the reliability and availability of your applications. Access tier clusters maintain no stateful information about applications, so the failure of a server does not result in any data loss. Network Gatekeeper also performs automated failover for servers within the Network tier. Any production installation must use the tiered configuration to protect against individual server failures. See Redundancy, Load Balancing, and High Availability in the Architecture Overview for more information.
WebLogic Server self-health monitoring features improve the reliability and availability of server instances in a domain. Selected subsystems within each server instance monitor their health status based on criteria specific to the subsystem. (For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics.) If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as “failed” with the host server.
Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.
When used in combination with Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator. For more information, see Using Node Manager to Control Servers in the WebLogic Server 10 documentation.
Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it tries to contact the Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally-cached configuration information—this is the configuration that was current at the time of the Managed Server’s most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. See Replicating domain config files for Managed Server Independence in the WebLogic Server 10 documentation.
When using Linux or UNIX operating systems, you can use WebLogic Server’s server migration feature to automatically start a candidate (backup) server if a Network tier server’s machine fails or becomes partitioned from the network. The server migration feature uses node manager, in conjunction with the wlsifconfig.sh
script, to automatically boot candidate servers using a floating IP address. Candidate servers are booted only if the primary server hosting a Network tier instance becomes unreachable. See
Migration in the WebLogic Server 10 documentation for more information about using the server migration feature.
In addition to server-level redundancy and failover capabilities, WebLogic Network Gatekeeper enables you to configure peer sites to protect against catastrophic failures, such as power outages, that can affect an entire domain. This enables you to failover from one geographical site to another, avoiding complete service outages. See Geographic Redundancy in the Architecture Overview for more information.
A WebLogic Network Gatekeeper deployment utilizes two basic categories of configuration information: domain-level configuration, and database configuration. The domain-level configuration consists of the artifacts used by the underlying WebLogic Server platform to configure the behavior of managed servers, clusters, security, and other resources deployed to clusters and servers within the domain. The primary domain-level configuration artifact is the config.xml
file, stored in the domain-home
/config
directory. The config.xml
file generally references additional configuration files beneath the config
directory to configure additional domain resources such as JDBC and JMS.
In addition to the basic domain-level configuration of the WebLogic Server platform, WebLogic Network Gatekeeper stores some configuration for its core services in the form of database tables. This includes the routing configuration for backward-compatible communication services and PRM integration data. The database tables are shared across clustered instances of WebLogic Network Gatekeeper server instances. The database must be backed up at regular intervals to protect against data loss or corruption. An Oracle RAC deployment is also required for production installations, to provide redundancy and failover for the database configuration.
Both domain-level configuration backups and database backups may be required at different times in order to fully restore servers, or migrate server instances to new server hardware, in a WebLogic Network Gatekeeper installation.
Maintaining system integrity requires that you make use of existing high availability and failover features, perform regular backups of configuration artifacts, and understand how to restore server instances or migrate servers onto viable hardware. These common tasks are summarized in Table 1-1.
|
|
|
|