|bea.com | products | dev2dev | support | askBEA|
|e-docs > WebLogic Server > Configuring and Managing WebLogic Server > Recovering Failed Servers|
Configuring and Managing WebLogic Server
Recovering Failed Servers
A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, and unexpected application behavior can all contribute to the failure of a server instance.
Depending on availability requirements, you may implement a clustered architecture to minimize the impact of failure events. (For information about failover in a WebLogic Server cluster, see "Failover and Replication in a Cluster" in Using WebLogic Server Clusters.) However, even in a clustered environment, server instances may fail periodically, and it is important to be prepared for the recovery process.
These topics describe WebLogic Server features for recovering failed servers instances, guidelines for backing up the data required for restart, and instructions for restarting failed server instances:
WebLogic Server Failure Recovery Features
This section describes WebLogic features that support recovery from failure.
Automatic Restart for Managed Servers
WebLogic Server provides self-health monitoring to improve the reliability and availability of server instances in a domain. Selected subsystems within each WebLogic Server instance monitor their health status based on criteria specific to the subsystem. (For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics.) If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as "failed" with the host server.
Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.
When used in combination with Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator.
For information on this feature, see Node Manager Capabilities. For instructions to configure Node Manager and automatic restart behaviors, see Configuring Node Manager.
Managed Server Independence Mode
When a Managed Server starts, it tries to contact the Administration Server to retrieve its configuration information. If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading configuration and security files directly. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. For information about disabling MSI mode, see "Disabling Managed Server Independence" in Administration Console Online Help.
In Managed Server Independence mode, a Managed Server looks in its root directory for the following files:
MSI Mode and the Managed Servers Root Directory
By default, a server instance assumes that its root directory is the directory from which it was started. For more information about a server's root directory, refer to A Server's Root Directory.
If you enable replication of configuration data, as described in Backing Up Security Data, and if you have started the Managed Server at least once while the Administration Server was running, msi-config.xml and SerializedSystemIni.dat will already be in the server's root directory. The boot.properties file is not replicated. If it is not already in the Managed Server's root directory, you must create one. For more information, "Bypassing the Prompt for Username and Password" in the Administration Console Online Help.
If msi-config.xml and SerializedSystemIni.dat are not in the root directory, you can either:
MSI Mode and the Security Realm
A Managed Server must have access to a security realm to complete its startup process.
If you use the security realm that WebLogic Server installs, then the Administration Server maintains an LDAP server to store the domain's security data. All Managed Servers replicate this LDAP server. If the Administration Server fails, Managed Servers running in MSI mode use the replicated LDAP server for security services.
If you use a third party security provider, then the Managed Server must be able to access the security data before it can complete its startup process.
MSI Mode and SSL
If you set up SSL for your servers, each server requires its own set of certificate files, key files, and other SSL-related files. Managed Servers do not retrieve SSL-related files from the Administration Server (though the domain's configuration file does store the pathnames to those files for each server). Starting in MSI Mode does not require you to copy or move the SSL-related files unless they are located on a machine that is inaccessible.
MSI Mode and Deployment
A Managed Server that starts in MSI mode deploys its applications from its staging directory: serverroot/stage/appName.
MSI Mode and Managed Server Configuration Changes
If you start a Managed Server in MSI mode, you cannot change its configuration until it restores communication with the Administration Server.
MSI Mode and Node Manager
You cannot use Node Manager to start a server instance in MSI mode, because Node Manager requires the presence of the Administration Server. If the Administration Server is unavailable, you must log onto Managed Server's host machine to start the Managed Server.
MSI Mode and Configuration File Replication
Managed Server Independence mode includes an option that copies the required configuration files into the Managed Server's root directory every 5 minutes. This option does not replicate a boot identity file. (For more information about boot identity files, see "Bypassing the Prompt for Username and Password" in Administration Console Online Help.)
By default, a Managed Server does not replicate these files. Depending on your backup schemes and the frequency with which you update your domain's configuration, this option might not be worth the performance cost of copying potentially large files across a network.
To enable a Managed Server to replicate the domain's configuration files, see "Replicating a Domain's Configuration Files for Managed Server Independence" in Administration Console Online Help.
MSI Mode and Restored Communication with an Administration Server
When the Administration Server starts, it can detect the presence of running Managed Servers (if -Dweblogic.management.discover=true, which is the default setting for this property).
Upon startup, the Administration Server looks at a persisted copy of the file running-managed-servers.xml and notifies all the Managed Servers listed in the file of its presence.
Managed Servers that were started in Managed Server Indpendence Mode while the Administration Server was unavailable will not appear in running-managed-servers.xml. To re-establish a connection between the Administration Server and such Managed Servers, use the weblogic.Admin DISCOVERMANAGEDSERVER command. For more information, see "DISCOVERMANAGEDSERVER in WebLogic Server Command Reference.
When an Administration Server starts up and contacts a Managed Server running in MSI mode, the Managed Server deactivates MSI mode and registers itself to the Administration Server for future configuration change notifications.
Backing Up Configuration and Security Data
Recovery from the failure of a server instance requires access to the domain's configuration and security data. This section describes file backups that WebLogic Server performs automatically, and recommended backup procedures that an administrator should perform.
Backing up config.xml
By default, an Administration Server stores a domain's configuration data in a file called domain_name\config.xml, where domain_name is the root directory of the domain.
Backup config.xml to a secure location in case a failure of the Administration Server renders the original copy unavailable. If an Administration Server fails, you can copy the backup version to a different machine and restart and Administration Server on the new machine.
WebLogic Server Archives Previous Versions of config.xml
By default, the Administration Server archives up to 5 previous versions of config.xml in the domain-name/configArchive directory.
When you save a change to a domain's configuration, the Administration Server saves the previous configuration in domain-name\configArchive\config.xml#n. Each time the Administration Server saves a file in the configArchive directory, it increments the value of the #n suffix, up to a configurable number of copies—5 by default. Thereafter, each time you change the domain configuration:
Example of Archived config.xml Naming and Rotation
In the MedRec domain, the current configuration file used by the MedRecServer is WL_HOME\samples\server\config\medrec\config.xml. If you add a server instance using the Administration Console, when you click the Create button, MedRecServer saves the old config.xml file as WL_HOME\samples\server\config\medrec\configArchive\config.xml#2.
The new file, WL_HOME\samples\server\config\medrec\config.xml, represents the MedRec domain with the new server instance. The previous file, WL_HOME\samples\server\config\medrec\configArchive\config.xml#2, contains the MedRec domain configuration as it was prior to creation of the new server instance.
The next time you change the configuration, MedRecServer saves the current config.xml file as config.xml#3. After four changes to the domain, the configArchive directory contains four files: config.xml#2, config.xml#3, config.xml#4, config.xml#5. The next time you change the configuration, MedRecServer saves the old config.xml as config.xml#5. The previous config.xml#5 is renamed as config.xml#4, and so on. The old config.xml#2 is deleted.
Configuring the Number of Archived config.xml Versions
To configure how many previous versions of the domain configuration are archived:
WebLogic Server Archives config.xml during Server Startup
In addition to the files in domain-name\configArchive, the Administration Server creates two other files that back up the domain's configuration at key points during the startup process:
Example of Archives of config.xml During Startup
If your domain configuration is stored in config.xml, when you start the domain's Administration Server, the Administration Server:
The Administration Server uses the parsed and modified config.xml. When you update the domain's configuration, it copies the old config.xml to domain-name\configArchive\MyConfig.xml#2.
Backing Up Security Data
The WebLogic Security service stores its configuration data config.xml file, and also in an LDAP repository and other files.
Backing Up the WebLogic LDAP Repository
The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with WebLogic Server store their data in an LDAP server. Each WebLogic Server contains an embedded LDAP server. The Administration Server contains the master LDAP server which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:
where domain_name is the domain's root directory and adminServer is the directory in which the Administration Server stores runtime and security data.
Each WebLogic Serve has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.
The ldap/ldapfiles subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under the ldap directory contain LDAP server message logs and data about replicated LDAP servers.
Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldap directory tree, the backups in the ldapfiles subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available, as described in WebLogic Server Backs Up LDAP Files.
WebLogic Server Backs Up LDAP Files
Once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP file below the ldap\backup directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.
For information about configuring the LDAP backup, see "Configuring Backups for the Embedded LDAP Server" in Administration Console Online Help.
Backing Up SerializedSystemIni.dat and Security Certificates
All servers create a file named SerializedSystemIni.dat and locate it in the server's root directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.
If you configured a server to use SSL, you must also back up the security certificates and keys. The location of these files is user-configurable.
Restarting Failed Server Instances
The nature of your applications and user demand determine the steps you take to restore application service. In particular, these factors influence the recovery process:
Restarting an Administration Server
The following sections describe how to start an Administration Server after a failur.
Restarting an Administration Server When Managed Servers are not Running
If no Managed Servers in the domain are running when you restart a failed Administration Server, no special steps are required. Start the Administration Server as you normally do. For details, see "Starting and Stopping Servers" in Administration Console Online Help.
Restarting an Administration Server When Managed Servers are Running
If the Administration Server shuts down while Managed Servers continue to run, you do not need to restart the Managed Servers that are already running in order to recover management of the domain. The procedure for recovering management of an active domain depends upon whether you can restart the Administration Server on the same machine it was running on when the domain was started.
Restarting an Administration Server on the Same Machine
If you restart the WebLogic Administration Server while Managed Servers continue to run, by default the Administration Server can discover the presence of the running Managed Servers.
Note: Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers. For more information about -Dweblogic.management.discover, see "Server Communication" in weblogic.Server Command-Line Reference.
The root directory for the domain contains a file running-managed-servers.xml which is a list of the Managed Servers that the Administration Server knows about. When the Administration Server starts, it uses this list to check for the presence of running Managed Servers.
Restarting the Administration Server does not cause Managed Servers to update the configuration of static attributes. Static attributes are those that a server refers to only during its startup process. WebLogic Servers must be restarted to take account of changes to static configuration attributes. Discovery of the Managed Servers only enables the Administration Server to monitor the Managed Servers or make runtime changes in attributes that can be configured while a server is running (dynamic attributes).
Restarting an Administration Server on Another Machine
If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:
Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers. For more information about -Dweblogic.management.discover, see "Server Communication" in weblogic.Server Command-Line Reference.
When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.
Restarting Managed Servers
The following sections describe how to start Managed Servers after failure. For recovery considerations related to transactions and JMS, see Additional Failure Topics.
Starting a Managed Server When the Administration Server is Accessible
If the Administration Server is reachable by Managed Server that failed, you can:
Starting a Managed Server When the Administration Server Is Not Accessible
If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading locally cached configuration data. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. For a description of MSI mode, and the files that a Managed Server must access to start up in MSI mode, see Managed Server Independence Mode.
Note: If the Managed Server that failed was a clustered Managed Server that was the active server for a migratable service at the time of failure, perform the steps described in "Migrating When the Currently Active Host is Unavailable" in Using WebLogic Server Clusters. Do not start the Managed Server in MSI mode.
To start up a Managed Server in MSI mode:
If these files are not in the Managed Server's root directory:
Note: Alternatively, you can use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files.
The Managed Server will run in MSI mode until it is contacted by its Administration Server. For information about restarting the Administration Server in this scenario, see Restarting an Administration Server When Managed Servers are Running.
Additional Failure Topics
For information related to recovering JMS data from a failed server instance, see "Configuring JMS Migratable Targets" in Programming WebLogic JMS.
For information about transaction recovery after failure, see "Moving a Server to Another Machine and "Transaction Recovery After a Server Fails" in Administration Console Online Help.