BEA Logo BEA Tuxedo Release 7.1

  Corporate Info  |  News  |  Solutions  |  Products  |  Partners  |  Services  |  Events  |  Download  |  How To Buy

 

   Tuxedo Doc Home   |   Administration   |   Topic List   |   Previous   |   Next   |   Contents

   Administering a BEA Tuxedo Application at Run Time

Repairing Partitioned Networks

This topic provides instructions for troubleshooting a partition, identifying its cause, and taking action to recover from it. A network partition exists if one or more machines cannot access the MASTER machine. As the application administrator, you are responsible for detecting partitions and recovering from them.

A network partition may be caused by any the following failures:

The procedure you follow to recover from a partitioned network depends on the cause of the partition.

Detecting a Partitioned Network

You can detect a network partition in one of the following ways:

How to Check the ULOG

When problems occur with the network, BEA Tuxedo system administrative servers start sending messages to the ULOG. If the ULOG is set up over a remote file system, all messages are written to the same log. In this scenario, you can run the tail(1) command on one file and check the failure messages displayed on the screen.

If, however, the remote file system is using the network in which the problem has occurred, the remote file system may no longer be available.

Example of a ULOG Error Message


151804.gumby!DBBL.28446: ... : ERROR: BBL partitioned, machine=SITE2


How to Gather Information About the Network, Server, and Service

The following is an example of a tmadmin session in which information is being collected about a partitioned network, a server, and a service on that network. Three tmadmin commands are run:

Restoring a Network Connection

This topic provides instructions for recovering from transient and severe network failures.

How to Recover from Transient Network Failures

Because the BRIDGE tries, automatically, to recover from any transient network failures and reconnect, transient network failures are usually not noticed. If, however, you need to perform a manual recovery from a transient network failure, complete the following procedure.

  1. On the MASTER machine, start a tmadmin(1) session.

  2. Run the reconnect command (rco), specifying the names of nonpartitioned and partitioned machines.

    rco non-partioned_node1 partioned_node2

How to Recover from Severe Network Failures

To recover from severe network failure, complete the following procedure.

  1. On the MASTER machine, start a tmadmin session.

  2. Run the pclean command, specifying the name of the partitioned machine.

    pcl partioned_machine

  3. Migrate the application servers or, once the problem has been corrected, reboot the machine.