Oracle7 Server Distributed Systems Manual, Vol. 1 Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

Distributed Updates


This chapter describes how Oracle7 maintains the integrity of distributed update transactions. Topics include:


Updates in a Distributed Environment

Distributed updates add several levels of complexity to a distributed system. The fact that multiple users are sharing and accessing data that exists at many sites rather that at a single site, adds the following considerations when these users attempt to update that data:

Replication

In a distributed system, a data object can be represented at many sites. Making sure that updates to any representative at any of the sites are propagated to all other sites is the responsibility of Oracle7's replication mechanisms.

The Oracle Server provides several methods for data replication.

A snapshot is a full copy of a table, or a subset of a table, that reflects a recent state of a master table (the table on the node that you designate as the master node). A snapshot is defined by a distributed query that references one or more master tables, views, or, with certain limitations, other snapshots.

Read-only snapshots can be used for queries only and are the simplest form of replication. They are typically used for systems in which many sites need to query data that are updated only by one site..

Updatable snapshots can reflect local updates and therefore improve response time by avoiding network traffic. However, there must be a mechanism that ensures that the local updates are not lost when the snapshot is refreshed from the master table.

Oracle's symmetric replication facility provides that mechanism. It allows multiple copies of data to be maintained at different sites in a distributed system.

See Oracle7 Server Distributed Systems, Volume II for information about the concepts behind replication and how to implement replication in your distributed system.

Concurrency Control

In a distributed system, there is the strong potential for more than one user to be concurrently executing transactions that update the same data and produce incorrect results. Oracle provides locking mechanisms to handle multiuser access to the same data.

See Chapter 10, "Data Concurrency", of the Oracle7 Server Concepts manual for more information.

In a distributed system, there is another level of complexity. A situation called global deadlock can occur. For example, transaction T1 has a lock on object A in San Francisco and requests a lock on object B in Dallas, and transaction T2 has a lock on object B in Dallas and requests a lock on object A in San Francisco. This series of events would cause both transactions to go into wait states.

In Oracle distributed transactions, local deadlocks are detected by analyzing a "waits for" graph, and global deadlocks are detected by a time-out. Once detected, non-distributed and distributed deadlocks are handled by the database and application in the same way.

Distributed Transaction Management

All participants (nodes) in a distributed transaction should be unanimous as to the action to take on that transaction. That is, they should either all commit or rollback.

Oracle uses a prepare/commit mechanism to ensure that all participants in the distributed transaction are "on the same track".

Briefly, a prepare/commit mechanism works this way (more detailed instructions for implementing prepare/commit are in following sections):


The Distributed Transaction Management Mechanism

Oracle7 automatically controls and monitors the commit or rollback of a distributed update transaction and maintains the integrity of the global database (the collection of databases participating in the transaction) using a transaction recovery management mechanism known as prepare/commit). This mechanism is completely transparent. Its use requires no programming on the part of the user or application developer.

The information in this section explains how this mechanism works, why it is used, and how you can take advantage of its features to better design and configure a distributed system.

Coordinating Distributed Updates

By definition, changes made by all SQL statements in a transaction are either committed or rolled back as a unit. The commit of a non-distributed transaction (one that contains SQL statements that modify data only at a local database) is simple -- all changes are either committed or rolled back as a unit in the non-distributed system.

However, the commit or rollback of a distributed transaction must be coordinated over a network so that the participating nodes either all commit or all roll back the transaction, even if a network failure or a system failure of any number of nodes occur during the process. The prepare/commit mechanism guarantees that the nodes participating in a distributed transaction either all commit or all roll back the transaction, thus maintaining the integrity of the global database.

When Is Transaction Management Needed?

The transaction management mechanism is used only when a distributed update changes the contents of two or more databases in the distributed system, or there is a remote procedure call (RPC) that references a remote object using its global object name. When a node is read-only, Oracle7 automatically notes this, and the node need not participate in the ensuing prepare/commit phases.

All implicit database changes performed by integrity constraints, remote procedure calls, and triggers are also protected by Oracle7's distributed transaction management mechanism.


The Prepare and Commit Phases

The processing of a distributed update has two distinct phases:

prepare phase

The global coordinator (initiating node) asks participants to prepare (to promise to commit or rollback the transaction, even if there is a failure).

commit phase

If all participants respond to the coordinator that they are prepared, the coordinator asks all nodes to commit the transaction. If any participants cannot prepare, the coordinator asks all nodes to roll back the transaction.

When a user commits a distributed transaction with a COMMIT statement, both phases are performed automatically. The following sections describe each phase in further detail.

Prepare Phase

The first phase in committing a distributed transaction is the prepare phase in which the commit of the transaction is not actually carried out. Instead, all nodes referenced in a distributed transaction (except one, known as the commit point site, described [*]) are told to prepare (to commit). By preparing, a node records enough information so that it can subsequently either commit or abort the transaction (in which case, a rollback will be performed), regardless of intervening failures.

When a node responds to its requestor that it has prepared, the prepared node has made a promise to be able to either commit or roll back the transaction later and not to make a unilateral decision on whether to commit or roll back the transaction.

Note: Queries that start after a node has prepared cannot access the associated locked data until all phases are complete (an insignificant amount of time unless a failure occurs).

When a node is told to prepare, it can respond with one of three responses:

prepared

Data on the node has been modified by a statement in the distributed transaction, and the node has successfully prepared.

read-only

No data on the node has been, or can be, modified (only queried), so no prepare is necessary.

abort

The node cannot successfully prepare.

Prepare Phase Actions by Nodes

To complete the prepare phase, each node performs the following actions:

These actions guarantee that the transaction can subsequently commit or roll back on that node. The prepared nodes then wait until a COMMIT or ROLLBACK is sent. Once the node(s) are prepared, the transaction is said to be in-doubt.

Read-only Response

When a node is asked to prepare and the SQL statements affecting the database do not change that node's data, the node responds to the node that referenced it with a read-only message. These nodes do not participate in the second phase (the commit phase). For more information about read-only distributed transactions, see "Read-Only Distributed Transactions" [*]

Unsuccessful Prepare

When a node cannot successfully prepare, it performs the following actions:

These actions then propagate to the other nodes involved in the distributed transaction to roll back the transaction and guarantee the integrity of the data in the global database.

Again, this enforces the primary rule of a distributed transaction. All nodes involved in the transaction either all commit or all roll back the transaction at the same logical time.

Commit Phase

The second phase in committing a distributed transaction is the commit phase. Before this phase occurs, all nodes referenced in the distributed transaction have guaranteed that they have the necessary resources to commit the transaction. That is, they are all prepared.

Therefore, the commit phase consists of the following steps:

When the commit phase is complete, the data on all nodes of the distributed system are consistent with one another.

A variety of failure cases, caused by network or system failures, are possible during both the prepare phase and the commit phase. For a description of failure situations and how Oracle7 resolves intervening failures during prepare/commit, see "Troubleshooting Distributed Update Problems" [*].


The Session Tree

As the statements in a distributed transaction are issued, Oracle7 defines a session tree of all nodes participating in the transaction. A session tree is a hierarchical model that describes the relationships between sessions and their roles. All nodes participating in the session tree of a distributed transaction assume one or more roles:

The role a node plays in a distributed transaction is determined by:

Figure 5 - 1 below illustrates a simple session tree.

Figure 5 - 1. An example of a Simple Session Tree

Clients

A node acts as a client when it references information from another node's database. The referenced node is a database server. In the above example, the node SALES.ACME.COM is a client of the nodes (database servers) that serve the WAREHOUSE and FINANCE databases.

Servers and Database Servers

A server is a node that is directly referenced in a distributed transaction or is requested to participate in a transaction because another node requires data from its database. A node supporting a database is also called a database server.

In Figure 5 - 1, an application at the node holding the SALES database initiates a distributed transaction which accesses data from the nodes holing the WAREHOUSE and FINANCE databases. Therefore, SALES.ACME.COM has the role of client node, and WAREHOUSE and FINANCE are both database servers.

In this example, SALES is a database server and a client because the application is also requesting an update of the SALES database.

Local Coordinators

A node that must reference data on other nodes to complete its part in the distributed transaction is called a local coordinator. In Figure 5 - 1, SALES.ACME.COM, although it happens to be the global coordinator, is also considered a local coordinator because it coordinates the nodes it directly references: WAREHOUSE.ACME.COM and FINANCE.ACME.COM.

A local coordinator is responsible for coordinating the transaction among the nodes it communicates directly with by:

The Global Coordinator

The node where the distributed transaction originates (to which the database application issuing the distributed transaction is directly connected) is called the global coordinator. This node becomes the parent or root of the session tree. The global coordinator performs the following operations during a distributed transaction:

For more information about the global coordinator's role, see "A Simple Example" [*].

The Commit Point Site

The job of the commit point site is to initiate a commit or roll back as instructed by the global coordinator. The system administrator always designates one node to be the commit point site in the session tree by assigning all nodes a commit point strength (see below). The node selected as commit point site should be that node that stores the most critical data (the data most widely used)

The commit point site is distinct from all other nodes involved in a distributed transaction with respect to the following two issues:

A distributed transaction is considered to be committed once all nodes are prepared and the transaction has been committed at the commit point site (even though some participating nodes may still be only in the prepared state and the transaction not yet actually committed).

The commit point site's redo log is updated as soon as the distributed transaction is committed at that node. Likewise, a distributed transaction is considered not committed if it has not been committed at the commit point site.

Commit Point Strength

Every node acting as a database server must be assigned a commit point strength. If a database server is referenced in a distributed transaction, the value of its commit point strength determines what role it plays in the prepare/commit phases. Specifically, the commit point strength determines whether a given node is the commit point site in the distributed transaction.

This value is specified using the initialization parameter COMMIT_POINT_STRENGTH (see page 5 - 13).

The commit point site is determined at the beginning of the prepare phase. The commit point site is selected only from the nodes participating in the transaction. Once it has been determined, the global coordinator sends prepare messages to all participating nodes.

Of the nodes directly referenced by the global coordinator, the node with the highest commit point strength is selected. Then, the initially-selected node determines if any of its servers (other nodes that it has to obtain information from for this transaction) has a higher commit point strength.

Either the node with the highest commit point strength directly referenced in the transaction, or one of its servers with a higher commit point strength becomes the commit point site. Figure 5 - 2 shows in a sample session tree the commit point strengths of each node (in parentheses) and shows the node chosen as the commit point site.

Figure 5 - 2. Commit Point Strengths and Determination of the Commit Point Site

The following conditions apply when determining the commit point site:

The commit point strength only determines the commit point site in a distributed transaction. Because the commit point site stores information about the status of the transaction, the commit point site should not be a node that is frequently unreliable or unavailable in case other nodes need information about the transaction's status.

As Figure 5 - 2 illustrates, the commit point site and the global coordinator can be different nodes of the session tree.

The commit point strengths of each nodes is communicated to the coordinator(s) when the initial connections are made. The coordinator(s) retain the commit point strengths of each node they are in direct communication with so that commit point sites can be efficiently selected during prepare/commits. Therefore, it is not necessary for the commit point strength to be exchanged between a coordinator and a node each time a commit occurs.

Specifying the Commit Point Strength of an Instance

Specify a commit point strength for each node that insures that the most critical server will be "non-blocking" if a failure occurs during a prepare/commit phase.

A node's commit point strength should relate to the estimated number of collisions that can result from data locked by in-doubt transactions. For example, mainframe-based database servers will probably have higher commit point strengths than minicomputer-based servers. In turn, minicomputer-based database servers will probably have higher commit point strengths than PC-based database servers. To determine each node's commit point strength, it will be necessary for all administrators of the distributed system to communicate and establish the appropriate values.

A node's commit point strength is set by the initialization parameter COMMIT_POINT_STRENGTH. The range of values is any integer from 0 to 255. For example, to set the commit point strength of a database to 200, include the following line in that database's parameter file:

COMMIT_POINT_STRENGTH=200

Additional Information: See your Oracle operating system-specific documentation for the default value.


A Case Study

This case study illustrates:

The Scenario

A company that has separate Oracle7 servers, SALES.ACME.COM and WAREHOUSE.ACME.COM. As sales records are inserted into the SALES database, associated records are being updated at the WAREHOUSE database.

The Process

The following steps are carried out during a distributed update transaction that enters a sales order:

Figure 5 - 3. Defining the Session Tree

Figure 5 - 4. Determining the Commit Point Site

Figure 5 - 5. Sending and Acknowledging the PREPARE Message

Figure 5 - 6. The Global Coordinator and Other Servers Commit the Transaction

All of the steps described above are accomplished automatically and in a fraction of a second.


Coordination of System Change Numbers

Each committed transaction has an associated system change number (SCN) to uniquely identify the changes made by the SQL statements within that transaction. In a distributed system, the SCNs of communicating nodes are coordinated when:

Among other benefits, the coordination of SCNs among the nodes of a distributed system allows global distributed read-consistency at both the statement and transaction level. If necessary, global distributed time-based recovery can also be completed.

During the prepare phase, Oracle7 determines the highest SCN at all nodes involved in the transaction. The transaction then commits with the high SCN at the commit point site. The commit SCN is then sent to all prepared nodes with the commit decision.


Read-Only Distributed Transactions

There are three cases in which all or part of a distributed transaction is read-only:


Limiting the Number of Distributed Transactions Per Node

The initialization parameter DISTRIBUTED_TRANSACTIONS controls the number of possible distributed transactions in which a given instance can concurrently participate, both as a client and a server. If this limit is reached and a subsequent user tries to issue a SQL statement referencing a remote database, the statement is rolled back and the following error message is returned:

ORA-2042: too many global transactions

For example, assume that DISTRIBUTED_TRANSACTIONS is set to 10 for a given instance. In this case, a maximum of ten sessions can concurrently be processing a distributed transaction. If an eleventh session attempts to issue a DML statement requiring distributed access, an error message is returned to the session, and the statement is rolled back.

The database administrator should consider increasing the value of the initialization parameter DISTRIBUTED_TRANSACTIONS when an instance regularly participates in numerous distributed transactions and the above error message is frequently returned as a result of the current limit. Increasing the limit allows more users to concurrently issue distributed transactions.

If the DISTRIBUTED_TRANSACTIONS initialization parameter is set to zero, no distributed SQL statements can be issued in any session. Also, the RECO background process is not started at startup of the local instance. In-doubt distributed transactions that may be present (from a previous network or system failure) cannot be automatically resolved by Oracle7.

Therefore, only set this initialization parameter to zero to prevent distributed transactions when a new instance is started, and when it is certain that no in-doubt distributed transactions remained after the last instance shut down.

Additional Information: See the Oracle7 Server Reference for more information.


Troubleshooting Distributed Update Problems

A network or system failure can cause the following types of problems:

The following sections describe these situations.

Failures that Interrupt Prepare/Commit

The user program that commits a distributed transaction is informed of a problem by one of the following error messages:

ORA-02050: transaction ID rolled back,
           some remote dbs may be in-doubt
ORA-02051: transaction ID committed,
           some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt

A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.

Note: The failure cases that prompt these error messages are beyond the scope of this book and are unnecessary to administer the system.

No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle7 transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) once the network or system failure is resolved.

However, in extended outages, the administrator may wish to force the commit or rollback of a transaction to release any locked data. Applications must account for such possibilities.

Failures that Prevent Access of Data

When a user issues a SQL statement, Oracle7 attempts to lock the required resources to successfully execute the statement. However, if the requested data is currently being held by statements of other uncommitted transactions and continues to remained locked for an excessive amount of time, a time-out occurs. Consider the following two scenarios.

Transaction Time-Out

A DML SQL statement that requires locks on a remote database may be blocked from doing so if another transaction (distributed or non-distributed) currently own locks on the requested data. If these locks continue to block the requesting SQL statement, a time-out occurs, the statement is rolled back, and the following error message is returned to the user:

ORA-02049: time-out: distributed transaction waiting for lock

Because no data has been modified, no actions are necessary as a result of the time-out. Applications should proceed as if a deadlock has been encountered. The user who executed the statement can try to re-execute the statement later. If the lock persists, the user should contact an administrator to report the problem.

The timeout interval in the above situation can be controlled with the initialization parameter DISTRIBUTED_LOCK_TIMEOUT. This interval is in seconds. For example, to set the time-out interval for an instance to 30 seconds, include the following line in the associated parameter file:

DISTRIBUTED_LOCK_TIMEOUT=30

With the above time-out interval, the time-out errors discussed in the previous section occur if a transaction cannot proceed after 30 seconds of waiting for unavailable resources.

Additional Information: For more information about initialization parameters and editing parameter files, see the Oracle7 Server Reference.

Lock From In-Doubt Transaction

A query or DML statement that requires locks on a local database may be blocked from doing so indefinitely due to the locked resources of an in-doubt distributed transaction. In this case, the following error message is immediately returned to the user:

ORA-01591: lock held by in-doubt distributed transaction ID

In this case, the SQL statement is rolled back immediately. The user who executed the statement can try to re-execute the statement later. If the lock persists, the user should contact an administrator to report the problem, including the ID of the in-doubt distributed transaction.

The chances of the above situations occurring are very rare, considering the low probability of failures during the critical portions of the prepare/commit phases. Even if such a failure occurs and assuming quick recovery from a network or system failure, problems are automatically resolved without manual intervention. Thus problems usually resolve before they can be detected by users or database administrators.


Manually Overriding In-Doubt Transactions

A database administrator can manually force the COMMIT or ROLLBACK of a local in-doubt distributed transaction. However, a specific in-doubt transaction should be manually overridden only when the following situations exist:

Normally, a decision to locally force an in-doubt distributed transaction should be made in consultation with administrators at other locations. A wrong decision can lead to database inconsistencies which can be difficult to trace and that you must manually correct.

If the conditions above do not apply, always allow the automatic recovery features of Oracle7 to complete the transaction. However, if any of the above criteria are met, the administrator should consider a local override of the in-doubt transaction. If a decision is made to locally force the transaction to complete, the database administrator should analyze available information with the following goals in mind:

		COMMIT COMMENT 'Finance/Accts_pay/Trans_type 10B';

		ALTER SESSION ADVISE COMMIT;
		INSERT INTO emp@hq ... ;    /*advice to commit at HQ */
		ALTER SESSION ADVISE ROLLBACK;
		DELETE FROM emp@sales ... ; /*advice to roll back at SALES*/
		ALTER SESSION ADVISE NOTHING;

Manual Override Example

The following example shows a failure during the commit of a distributed transaction and how to go about gaining information before manually forcing the commit or rollback of the local portion of an in-doubt distributed transaction. Figure 5 - 7 illustrates the example.

Figure 5 - 7. An Example of an in-Doubt Distributed Transaction

In this failure case, the prepare phase completed. However, during the commit phase, the commit point site's commit message (the message telling the global coordinator that the transaction was committed at the commit point site) never made it back to the global coordinator, even though the commit point site committed the transaction.

You are the WAREHOUSE database administrator. The inventory data locked because of the in-doubt transaction is critical to other transactions. However, the data cannot be accessed because the locks must be held until the in-doubt transaction either commits or rolls back. Furthermore, you understand that the communication link between sales and headquarters cannot be resolved immediately. Therefore, you decide to manually force the local portion of the in-doubt transaction using the following steps:

The following sections explain each step in detail for this example.

Step 1: Record User Feedback

The users of the local database system that conflict with the locks of the in-doubt transaction get the following error message:

ORA-01591: lock held by in-doubt distributed transaction 1.21.17

Here, 1.21.17 is the local transaction ID of the in-doubt distributed transaction in this example. The local database administrator should request and record this ID number from the users that report problems to identify in-doubt transactions that should be forced.

Step 2: Query DBA_2PC_PENDING

Query the local DBA_2PC_PENDING (see also page 5 - 32) to gain information about the in-doubt transaction:

SELECT * FROM sys.dba_2pc_pending
   WHERE local_tran_id = '1.21.17';

For example, when the previous query is issued at WAREHOUSE, the following information is returned.

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.21.17
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  prepared
MIXED                  no
ADVICE
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   system1
DB_USER                SWILLIAMS
COMMIT#

Figure 5 - 8. Results of Querying DBA_2PC_PENDING

The global transaction ID is the common transaction ID that is the same on every node for a distributed transaction. It is of the form:

global_database_name.hhhhhhhh.local_transaction_id

Here, global_database_name is the database name of the global coordinator (where the transaction originates), hhhhhhhh is an internal database ID at the global coordinator (8 hexadecimal digits), and local_tran_id is the corresponding local transaction ID assigned on the global coordinator. Therefore, the last portion of the global transaction ID and the local transaction ID match at the global coordinator. In the example, you can tell that WAREHOUSE is not the global coordinator because these numbers do not match.

The transaction on this node is in a prepared state. Therefore, WAREHOUSE awaits its coordinator to send either a commit or a rollback message.

The transaction's comment or advice may include information about this transaction. If so, use this comment to your advantage. In this example, the origin (the sales order entry application) and transaction type is in the transaction's comment. This information may reveal something that would help you decide whether to commit or rollback the local portion of the transaction.

If useful comments do not accompany an in-doubt transaction, you must complete some extra administrative work to trace the session tree and find a node that has resolved the transaction.

Step 3: Query DBA_2PC_NEIGHBORS

The purpose of this step is to climb the session tree so that you find coordinators, eventually reaching the global coordinator. Along the way, you might find a coordinator that has resolved the transaction. If not, you can eventually work your way to the commit point site, which will always have resolved the in-doubt transaction.

The DBA_2PC_NEIGHBORS view provides information about connections associated with an in-doubt transaction. Information for each connection is different, based on whether the connection is inbound or outbound:

Additionally, the INTERFACE column tells whether the local node or a subordinate node is the commit point site.

To trace the session tree, you can query the local DBA_2PC_NEIGHBORS view. In this case, you query this view on the WAREHOUSE database.

SELECT * FROM sys.dba_2pc_neighbors
   WHERE local_tran_id = '1.21.17'
   ORDER BY sess#, in_out;
Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.21.17
IN_OUT                 in
DATABASE               SALES.ACME.COM
DBUSER_OWNER           SWILLIAMS
INTERFACE              N
DBID                   000003F4
SESS#                  1
BRANCH                 0100

The columns of particular interest in this view are the IN_OUT, DATABASE, DBUSER_OWNER, and INTERFACE columns. In this example, the IN_OUT column reveals that the WAREHOUSE database is a server for the SALES database, as specified in the DATABASE column. The connection to WAREHOUSE was established through a database link from the SWILLIAMS account, as shown by the DB_OWNER column, and WAREHOUSE, nor any of its descendants, was the commit point site, as shown by the INTERFACE column.

At this point, you can contact the administrator at the located nodes and ask them to repeat Steps 2 and 3, using the global transaction ID.

Note: If you can directly connect to these nodes with another network, you can repeat Steps 2 and 3 yourself.

For example, the following results are returned when Steps 2 and 3 are performed at SALES and HQ, respectively.

Manually Checking the Status of Pending Transactions at SALES.ACME.COM

SELECT * FROM sys.dba_2pc_pending

   WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29';
Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.93.29
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  prepared
MIXED                  no
ADVICE
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   system1
DB_USER                SWILLIAMS
COMMIT#
SELECT * FROM dba_2pc_neighbors
   WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29'
   ORDER BY sess#, in_out;

At SALES, there are three rows for this transaction (one for the connection to WAREHOUSE, one for the connection to HQ, and one for the connection established by the user). Information corresponding to the rows for the SALES and HQ connections is listed below:

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.93.29
IN_OUT                 OUT
DATABASE               WAREHOUSE.ACME.COM
DBUSER_OWNER           SWILLIAMS
INTERFACE              N
DBID                   55d1c563
SESS#                  1
BRANCH                 1
LOCAL_TRAN_ID          1.93.29
IN_OUT                 OUT
DATABASE               HQ.ACME.COM
DBUSER_OWNER           ALLEN
INTERFACE              C
DBID                   00000390
SESS#                  1
BRANCH                 1

The information from the previous query reveals several facts:

Manually Checking the Status of Pending Transactions at HQ.ACME.COM:

SELECT * FROM dba_2pc_pending

   WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29';
Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.45.13
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  COMMIT
MIXED                  NO
ACTION
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   SYSTEM1
DB_USER                SWILLIAMS
COMMIT#                129314

At this point, you have found a node that resolved the transaction. It has been committed. Therefore, you can force the in-doubt transaction to commit at your local database (see the following section for information on manually committing or rolling back in-doubt transactions). It is a good idea to contact any other administrators you know that could also benefit from your investigation.

Step 4: Check for Mixed Outcome

After you manually force a transaction to commit or roll back, the corresponding row in the pending transaction table remains. The STATE of the transaction is changed to forced commit or forced abort, depending on how you forced the transaction.

Furthermore, once connections between the instances resume, RECO checks the global outcome of the transaction. The MIXED column is changed to yes and the row for the transaction is not deleted if you forced the transaction the wrong way.

If you ever see a transaction forced the wrong way, you should be aware that some global data inconsistency may exist. Eventually, you can purge unnecessary rows from the pending transaction table.

The Pending Transaction Table (DBA_2PC_PENDING)

Every Oracle7 database has a pending transaction table which is a special table that stores information about distributed transactions as they proceed through the prepare/commit phases. You can query a database's pending transaction table by referencing the DBA_2PC_PENDING data dictionary view.

Each transaction with an entry in the pending transaction table is classified in one of the following categories (as indicated in DBA_2PC_PENDING.STATE):

collecting

This category normally applies only to the global coordinator or local coordinators. The node is currently collecting information from other database servers before it can decide whether it can prepare.

prepared

The node has prepared and may or may not have acknowledged this to its local coordinator with a prepared message. However, no commit message has been received. The node remains prepared, holding any local resource locks necessary for the transaction to commit.

committed

The node (any type) has committed the transaction, but other nodes involved in the transaction may not have done the same. That is, the transaction is still pending at one or more nodes.

forced commit

A pending transaction can be forced to commit at the discretion of a database administrator. This entry occurs if a transaction is manually committed at a local node by a database administrator.

forced abort (rollback)

A pending transaction can be forced to roll back at the discretion of a database administrator. This entry occurs if this transaction is manually rolled back at a local node by a database administrator.

Also of particular interest in the pending transaction table is the mixed outcome flag (as indicated in DBA_2PC_PENDING.MIXED). The database administrator can make the wrong choice if a pending transaction is forced to commit or roll back (for example, the local administrator rolls back the transaction, but the other nodes commit it). Incorrect decisions are detected automatically, and the damage flag for the corresponding pending transaction's record is set (MIXED=yes).

The RECO (Recoverer) background process uses the information in the pending transaction table to finalize the status of in-doubt transactions. The information in the pending transaction table can also be used by a database administrator, who decides to manually override the automatic recovery procedures for pending distributed transactions.

All transactions automatically resolved by RECO are automatically removed from the pending transaction table. Additionally, all information about in-doubt transactions correctly resolved by an administrator (as checked when RECO reestablishes communication) are automatically removed from the pending transaction table. However, all rows resolved by an administrator that result in a mixed outcome across nodes remain in the pending transaction table of all involved nodes until they are manually deleted.


Manually Committing In-Doubt Transactions

The local database administrator has two ways to manually force an in-doubt transaction to commit. The DBA can use the Server Manager Transaction Object List option Force Commit or the SQL command COMMIT with the FORCE option and a text string, indicating either the local or global transaction ID of the in-doubt transaction to commit. Figure 5 - 9 shows the Server Manager Transaction Object List.

Figure 5 - 9. Transaction Object List

Forcing a Commit or Rollback in Server Manager

To commit an in-doubt transaction, select the transaction from the Transaction Object List and choose Force Commit from the Transaction menu.

To roll back an in-doubt transaction, select the transaction from the Transaction Object List and choose Force Rollback from the Transaction menu.

Attention: You cannot roll back an in-doubt transaction to a savepoint.

Manually Committing or Rolling Back In-Doubt Transactions

The following SQL statement is the command equivalent of the action taken in Figure 5 - 9 to commit an in-doubt transaction.

COMMIT FORCE 'transaction_name';

To manually rollback an in-doubt transaction, use the SQL command ROLLBACK with the FORCE option and a text string, indicating either the local or global transaction ID of the in-doubt transaction to rollback. For example, to rollback the in-doubt transaction with the local transaction ID of 2.9.4, use the following statement:

ROLLBACK FORCE '2.9.4';

Attention: You cannot roll back an in-doubt transaction to a savepoint.

Privileges Required to Manually Commit or Rollback In-Doubt Transactions

To manually force the commit or rollback of an in-doubt transaction issued by yourself, you must have been granted the FORCE TRANSACTION system privilege. To force the commit or rollback of another user's distributed transaction, you must have the FORCE ANY TRANSACTION system privilege. Both privileges can be obtained either explicitly or via a role.

Note: Forcing the commit or rollback of an in-doubt distributed transaction does not affect the status of the operator's current transaction.

Forcing Rollback/Commit on the Local Pending Transaction Table

In all examples, the transaction is committed or rolled back on the local node, and the local pending transaction table records a value of forced commit or forced abort for the STATE column of this transaction's row.

Specifying the SCN

Optionally, you can specify the SCN for the transaction when forcing a transaction to commit. This feature allows you to commit an in-doubt transaction with the SCN assigned when it was committed at other nodes. Thus you maintain the synchronized commit time of the distributed transaction even if there is a failure. Specify an SCN only when you can determine the SCN of the same transaction already committed at another node.

For example, assume you want to manually commit a transaction with the global transaction ID global_id. First, query the DBA_2PC_PENDING view of a remote database also involved with the transaction in question. Note the SCN used for the commit of the transaction at that node. Specify the SCN (a decimal number) when committing the transaction at the local node. For example, if the SCN were 829381993, you would use the command:

COMMIT FORCE 'global_id', 829381993;


Changing Connection Hold Time

If a distributed transaction fails, the connection from the local site to the remote site may not close immediately. Instead, it remains open in case communication can be restored quickly, without having to re-establish the connection. You can set the length of time that the connection remains open with the database parameter DISTRIBUTED_RECOVERY_CONNECTION_HOLD_TIME.

A high value minimizes the cost of reconnecting after failures, but causes the local database to consume more resources. In contrast, a lower value minimizes the cost of resources kept locked during a failure, but increases the cost of reconnecting after failures. The default value of the parameter is 200 seconds. See the Oracle7 Server Reference for more information.


Setting a Limit on Distributed Transactions

The database parameter DISTRIBUTED_TRANSACTIONS sets a maximum on the number of distributed transactions in which a database can participate. You should increase the value of this parameter if your database is part of many distributed transactions. The default value is operating system-specific.

In contrast, if your site is experiencing an abnormally high number of network failures, you can temporarily decrease the value of this parameter. Doing so limits the number of in-doubt transactions in which your site takes part, and thereby limits the amount of locked data at your site, and the number of in-doubt transactions you might have to resolve.

For more information on this parameter, see the Oracle7 Server Reference.


Testing Distributed Transaction Recovery Features

If you like, you can force the failure of a distributed transaction to observe RECO, automatically resolving the local portion of the transaction. Alternatively, you might be interested in forcing a distributed transaction to fail so that you can practice manually resolving in-doubt distributed transactions and observing the results.

The following sections describes the features available and the steps necessary to perform such operations.

Forcing a Distributed Transaction to Fail

Comments can be included in the COMMENT parameter of the COMMIT statement. To intentionally induce a failure during the prepare/commit phases of a distributed transaction, include the following comment in the COMMENT parameter:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

where n is one of the following integers:

n Effect
1 Crash commit point site after collect
2 Crash non-commit point site after collect
3 Crash before prepare (non-commit point site)
4 Crash after prepare (non-commit point site)
5 Crash commit point site before commit
6 Crash commit point site after commit
7 Crash non-commit point site before commit
8 Crash non-commit point site after commit
9 Crash commit point site before forget
10 Crash non-commit point site before forget
Table 5 - 1. Failure Values for the Parameter COMMENT

For example, the following statement returns the following messages if the local commit point strength is greater than the remote commit point strength and both nodes are updated:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-7';
ORA-02054: transaction #.##.## in-doubt
ORA-02059: ORA-CRASH-TEST-n in commit comment

At this point, the in-doubt distributed transaction appears in the DBA_2PC_PENDING view. If enabled, RECO automatically resolves the transaction rather quickly.

Privileges Required to Induce Prepare/Commit Phase Failures

You can induce prepare/commit phase failures via the previous comments only if the local and remote sessions have the FORCE ANY TRANSACTION system privilege. Otherwise, an error is returned if you attempt to issue a COMMIT statement with a crash comment.

The Recoverer (RECO) Background Process

The RECO background process of an Oracle7 instance automatically resolves failures involving distributed transactions. At exponentially growing time intervals, the RECO background process of a node attempts to recover the local portion of an in-doubt distributed transaction.

RECO can use an existing connection or establish a new connection to other nodes involved in the failed transaction. When a connection is established, RECO automatically resolves all in-doubt transactions. Rows corresponding to any resolved in-doubt transactions are automatically removed from each database's pending transaction table.

Disabling and Enabling RECO

The recoverer background process, RECO, can be enabled and disabled using the ALTER SYSTEM command with the ENABLE/DISABLE DISTRIBUTED RECOVERY options, respectively. For example, you might want to temporarily disable RECO to force the failure of a prepare/commit and manually resolve the in-doubt transaction. The following statement disables RECO:

ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

Alternatively, the following statement enables RECO so that in-doubt transactions are automatically resolved:

ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;

Note: Single-process instances (for example, a PC running MS-DOS) have no separate background processes, and therefore no RECO process. Therefore, when a single-process instance that participates in a distributed system is started, distributed recovery must be manually enabled using the statement above.

Additional Information: See your Oracle operating system-specific documentation for more information about distributed transaction recovery for single-process instances.




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index