Skip navigation.

Configuring and Managing WebLogic Server

  Previous Next vertical dots separating previous/next from contents/index/pdf Contents Index View as PDF   Get Adobe Reader

Recovering Failed Servers

A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, and unexpected application behavior can all contribute to the failure of a server instance.

Depending on availability requirements, you may implement a clustered architecture to minimize the impact of failure events. (For information about failover in a WebLogic Server cluster, see Failover and Replication in a Cluster in Using WebLogic Server Clusters.) However, even in a clustered environment, server instances may fail periodically, and it is important to be prepared for the recovery process.

These following sections provide information about and procedures for recovering failed server instances:

 


WebLogic Server Failure Recovery Features

This section describes WebLogic features that support recovery from failure.

Automatic Restart for Managed Servers

WebLogic Server self-health monitoring improves the reliability and availability of server instances in a domain. Selected subsystems within each WebLogic Server instance monitor their health status based on criteria specific to the subsystem. (For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics.) If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as "failed" with the host server.

Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.

When used in combination with Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator. See Node Manager Capabilities.

To configure Node Manager and automatic restart behaviors, see Configuring Node Manager.

Managed Server Independence Mode

When a Managed Server starts, it tries to contact the Administration Server to retrieve its configuration information. If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading configuration and security files directly. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. For information about disabling MSI mode, see Disabling Managed Server Independence in Administration Console Online Help.

In Managed Server Independence mode, a Managed Server looks in its root directory for the following files:

MSI Mode and the Managed Servers Root Directory

By default, a server instance assumes that its root directory is the directory from which it was started. For more information about a server's root directory, refer to A Server's Root Directory.

If you enable replication of configuration data, as described in Backing Up Security Data, and if you have started the Managed Server at least once while the Administration Server was running, msi-config.xml and SerializedSystemIni.dat will already be in the server's root directory. The boot.properties file is not replicated. If it is not already in the Managed Server's root directory, you must create one. For more information, see Boot IdentityFiles in Administration Console Online Help.

If msi-config.xml and SerializedSystemIni.dat are not in the root directory, you can either:

MSI Mode and the Domain Log File

Each WebLogic Server instance writes log messages to its local log file and a domain-wide log file. The domain log file provides a central location from which to view messages from all servers in a domain.

Usually, a Managed Server forwards messages to the Administration Server, and the Administration Server writes the messages to the domain log file. However, when a Managed Server runs in MSI mode, it assumes the role of writing to the domain log file.

By default, the pathnames for local log files and domain log files are relative to the Manged Server's root directory. With these default settings, if a Managed Server is located in its own root directory (and it does not share its root directory with the Administration Server), when it runs in MSI mode the Managed Server will create its own domain log file in its root directory.

If a Managed Server shares its root directory with the Administration Server, or if you specified an absolute pathname to the domain log, the Managed Server in MSI mode will write to the domain log file that the Administration Server created.

Note: The Managed Server must have permission to write to the existing file. If you run the Administration Server and Managed Servers under different operating system accounts, you must modify the file permissions of the domain log file so that both user accounts have write permission.

MSI Mode and the Security Realm

A Managed Server must have access to a security realm to complete its startup process.

If you use the security realm that WebLogic Server installs, then the Administration Server maintains an LDAP server to store the domain's security data. All Managed Servers replicate this LDAP server. If the Administration Server fails, Managed Servers running in MSI mode use the replicated LDAP server for security services.

If you use a third party security provider, then the Managed Server must be able to access the security data before it can complete its startup process.

MSI Mode and SSL

If you set up SSL for your servers, each server requires its own set of certificate files, key files, and other SSL-related files. Managed Servers do not retrieve SSL-related files from the Administration Server (though the domain's configuration file does store the pathnames to those files for each server). Starting in MSI Mode does not require you to copy or move the SSL-related files unless they are located on a machine that is inaccessible.

MSI Mode and Deployment

A Managed Server that starts in MSI mode deploys its applications from its staging directory: serverroot/stage/appName.

MSI Mode and Managed Server Configuration Changes

If you start a Managed Server in MSI mode, you cannot change its configuration until it restores communication with the Administration Server.

MSI Mode and Node Manager

You cannot use Node Manager to start a server instance in MSI mode, only to restart it. For a routine startup, Node Manager requires access to the Administration Server. If the Administration Server is unavailable, you must log onto Managed Server's host machine to start the Managed Server.

MSI Mode and Configuration File Replication

Managed Server Independence mode includes an option that copies the required configuration files into the Managed Server's root directory every 5 minutes. This option does not replicate a boot identity file. (For more information about boot identity files, see Boot Identity Files in Administration Console Online Help.)

By default, a Managed Server does not replicate these files. Depending on your backup schemes and the frequency with which you update your domain's configuration, this option might not be worth the performance cost of copying potentially large files across a network.

To enable a Managed Server to replicate the domain's configuration files, see Replicating a Domain's Configuration Files for Managed Server Independence in Administration Console Online Help.

MSI Mode and Restored Communication with an Administration Server

When the Administration Server starts, it can detect the presence of running Managed Servers (if -Dweblogic.management.discover=true, which is the default setting for this property).

Upon startup, the Administration Server looks at a persisted copy of the file running-managed-servers.xml and notifies all the Managed Servers listed in the file of its presence.

Managed Servers that were started in Managed Server Independence Mode while the Administration Server was unavailable will not appear in running-managed-servers.xml. To re-establish a connection between the Administration Server and such Managed Servers, use the weblogic.Admin DISCOVERMANAGEDSERVER command. See "DISCOVERMANAGEDSERVER in WebLogic Server Command Reference.

When an Administration Server starts up and contacts a Managed Server running in MSI mode, the Managed Server deactivates MSI mode and registers itself to the Administration Server for future configuration change notifications.

 


Backing Up Configuration and Security Data

Recovery from the failure of a server instance requires access to the domain's configuration and security data. This section describes file backups that WebLogic Server performs automatically, and recommended backup procedures that an administrator should perform.

Backing up config.xml

By default, an Administration Server stores a domain's configuration data in a file called domain_name\config.xml, where domain_name is the root directory of the domain.

Back up config.xml to a secure location in case a failure of the Administration Server renders the original copy unavailable. If an Administration Server fails, you can copy the backup version to a different machine and restart the Administration Server on the new machine.

WebLogic Server Archives Previous Versions of config.xml

By default, the Administration Server archives up to 5 previous versions of config.xml in the domain-name/configArchive directory.

When you save a change to a domain's configuration, the Administration Server saves the previous configuration in domain-name\configArchive\config.xml#n. Each time the Administration Server saves a file in the configArchive directory, it increments the value of the #n suffix, up to a configurable number of copies—5 by default. Thereafter, each time you change the domain configuration:

Example of Archived config.xml Naming and Rotation

In the MedRec domain, the current configuration file used by the MedRecServer is WL_HOME\samples\domains\medrec\config.xml. If you add a server instance using the Administration Console, when you click the Create button, MedRecServer saves the old config.xml file as WL_HOME\samples\domains\medrec\configArchive\config.xml#2.

The new file, WL_HOME\samples\domains\medrec\config.xml, represents the MedRec domain with the new server instance. The previous file, WL_HOME\samples\domains\medrec\configArchive\config.xml#2, contains the MedRec domain configuration as it was prior to creation of the new server instance.

The next time you change the configuration, MedRecServer saves the current config.xml file as config.xml#3. After four changes to the domain, the configArchive directory contains four files: config.xml#2, config.xml#3, config.xml#4, config.xml#5. The next time you change the configuration, MedRecServer saves the old config.xml as config.xml#5. The previous config.xml#5 is renamed as config.xml#4, and so on. The old config.xml#2 is deleted.

Configuring the Number of Archived config.xml Versions

To configure how many previous versions of the domain configuration are archived:

  1. In the left pane of the Administration Console, click on the name of the domain.
  2. In the right pane, click the Configuration->General tab.
  3. In the Advanced Options bar, click Show.
  4. In the Archive Configuration Count box, enter the number of versions to save.
  5. Click Apply.

WebLogic Server Archives config.xml during Server Startup

In addition to the files in domain-name\configArchive, the Administration Server creates two other files that back up the domain's configuration at key points during the startup process:

Example of Archives of config.xml During Startup

If your domain configuration is stored in config.xml, when you start the domain's Administration Server, the Administration Server:

  1. Copies config.xml to config.xml.original.
  2. Parses config.xml. Depending on the domain configuration, some WebLogic subsystems add configuration information to config.xml. For example, the Security service adds MBeans and encrypted data for SSL communication.
  3. Copies the parsed and modified config.xml to MyConfig.xml.booted.

The Administration Server uses the parsed and modified config.xml. When you update the domain's configuration, it copies the old config.xml to domain-name\configArchive\MyConfig.xml#2.

Backing Up Security Data

The WebLogic Security service stores its configuration data config.xml file, and also in an LDAP repository and other files.

Backing Up the WebLogic LDAP Repository

The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with WebLogic Server store their data in an LDAP server. Each WebLogic Server contains an embedded LDAP server. The Administration Server contains the master LDAP server which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:

domain_name\adminServer\ldap

where domain_name is the domain's root directory and adminServer is the directory in which the Administration Server stores runtime and security data.

Each WebLogic Serve has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.

The ldap/ldapfiles subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under the ldap directory contain LDAP server message logs and data about replicated LDAP servers.

Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldap directory tree, the backups in the ldapfiles subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available, as described in WebLogic Server Backs Up LDAP Files.

WebLogic Server Backs Up LDAP Files

Once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP file below the ldap\backup directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.

For information about configuring the LDAP backup, see Configuring Backups for the Embedded LDAP Server in Administration Console Online Help.

Backing Up SerializedSystemIni.dat and Security Certificates

All servers create a file named SerializedSystemIni.dat and locate it in the server's root directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.

If you configured a server to use SSL, you must also back up the security certificates and keys. The location of these files is user-configurable.

 


Restarting Failed Server Instances

The nature of your applications and user demand determine the steps you take to restore application service. In particular, these factors influence the recovery process:

Restarting an Administration Server

The following sections describe how to start an Administration Server after a failure.

Restarting an Administration Server When Managed Servers Not Running

If no Managed Servers in the domain are running when you restart a failed Administration Server, no special steps are required. Start the Administration Server as you normally do. See Starting and Stopping Servers in Administration Console Online Help.

Restarting an Administration Server When Managed Servers Are Running

If the Administration Server shuts down while Managed Servers continue to run, you do not need to restart the Managed Servers that are already running in order to recover management of the domain. The procedure for recovering management of an active domain depends upon whether you can restart the Administration Server on the same machine it was running on when the domain was started.

Restarting an Administration Server on the Same Machine

If you restart the WebLogic Administration Server while Managed Servers continue to run, by default the Administration Server can discover the presence of the running Managed Servers.

Note: Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers. For more information about -Dweblogic.management.discover, see Server Communication in weblogic.Server Command-Line Reference.

The root directory for the domain contains a file running-managed-servers.xml which contains a list of the Managed Servers in the domain and whether they are running or not. When the Administration Server restarts, it checks this file to determine which Managed Servers were under its control before it stopped running.

When a Managed Server is gracefully or forcefully shut down, its status in running-managed-servers.xml is updated to "not-running". When an Administration Server restarts, it does not try to discover Managed Servers with the "not-running" status. A Managed Servers that stops running because a system crash, or that was stopped by killing the JVM or the command prompt (shell) in which it was running, will still have the status "running' in running-managed-servers.xml. The Administration Server will attempt to discover them, and will throw an exception when it determines that the Managed Server is no longer running.

Restarting the Administration Server does not cause Managed Servers to update the configuration of static attributes. Static attributes are those that a server refers to only during its startup process. Servers instances must be restarted to take account of changes to static configuration attributes. Discovery of the Managed Servers only enables the Administration Server to monitor the Managed Servers or make runtime changes in attributes that can be configured while a server is running (dynamic attributes).

Restarting an Administration Server on Another Machine

If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:

  1. Install the WebLogic Server software on the new administration machine (if this has not already been done).
  2. Make your application files available to the new Administration Server by copying them from backups or by using a shared disk. Your application files should be available in the same relative location on the new file system as on the file system of the original Administration Server.
  3. Make your configuration and security data available to the new administration machine by copying them from backups or by using a shared disk. For more information, refer to Backing Up Configuration and Security Data.
  4. Restart the Administration Server on the new machine.
  5. Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false, which disables an Administration Server from discovering its running Managed Servers. For more information about -Dweblogic.management.discover, see Server Communication in weblogic.Server Command-Line Reference.

When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.

Restarting Managed Servers

The following sections describe how to start Managed Servers after failure. For recovery considerations related to transactions and JMS, see Additional Failure Topics.

Starting a Managed Server When the Administration Server Is Accessible

If the Administration Server is reachable by Managed Server that failed, you can:

Starting a Managed Server When the Administration Server Is Not Accessible

If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading locally cached configuration data. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. For a description of MSI mode, and the files that a Managed Server must access to start up in MSI mode, see Managed Server Independence Mode.

Note: If the Managed Server that failed was a clustered Managed Server that was the active server for a migratable service at the time of failure, perform the steps described in Migrating When the Currently Active Host is Unavailable in Using WebLogic Server Clusters. Do not start the Managed Server in MSI mode.

To start up a Managed Server in MSI mode:

  1. Ensure that the following files are available in the Managed Server's root directory:
  2. If these files are not in the Managed Server's root directory:

    1. Copy the config.xml and SerializedSystemIni.dat file from the Administration Server's root directory (or from a backup) to the Managed Server's root directory.
    2. Rename the configuration file to msi-config.xml. When you start the server, it will use the copied configuration files.
    3. Note: Alternatively, you can use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files.

  3. Start the Managed Server at the command line or using a script.
  4. The Managed Server will run in MSI mode until it is contacted by its Administration Server. For information about restarting the Administration Server in this scenario, see Restarting an Administration Server When Managed Servers Are Running.

Additional Failure Topics

For information related to recovering JMS data from a failed server instance, see Configuring JMS Migratable Targets in Programming WebLogic JMS.

For information about transaction recovery after failure, see Moving a Server to Another Machine" and "Transaction Recovery After a Server Fails" in Administration Console Online Help.

 

Skip navigation bar  Back to Top Previous Next