|Oracle8i Parallel Server Concepts
Release 2 (8.1.6)
Part Number A76968-01
This chapter describes the architectural components that Oracle provides for Oracle Parallel Server processing. These are the components that are in addition to the components for single-instances; they are thus unique to Oracle Parallel Server. Some of these components are supplied with the Oracle software and others are vendor-specific.
Topics in this chapter include:
Based on the architectural models described in Chapter 2, the following explains the software required for implementing Oracle Parallel Server.
Each hardware vendor implements parallel processing using operating system dependent layers. These layers serve as communication links between the operating system and the Oracle Parallel Server software described in this chapter.
A high-level view of these components appears in Figure 3-1.
The Cluster Manager software oversees internode messaging that travels over the interconnect to coordinate internode operations. The Distributed Lock Manager oversees the operation of Parallel Cache Management functions. The following describes the following in more detail:
The Cluster Manager provides a global view of the cluster and all nodes in it. It also controls cluster membership. Typically, the Cluster Manager is a vendor-supplied component. However, Oracle supplies the Cluster Manager for Windows NT environments.
Oracle Parallel Server also cooperates with the Cluster Manager to achieve high availability. The Cluster Manager automatically starts and stops when the instance starts and stops.
A Cluster Manager disconnect can occur for any of three reasons: the client disconnects voluntarily, the client's process terminates, or the client's node shuts down or fails. This is true even if one or more nodes fail. If the Cluster Manager determines that a node is inactive or not functioning properly, the Cluster Manager terminates all processes on that node or instance.
If there is a failure, recovery is transparent to user applications. The Cluster Manager automatically reconfigures the system to isolate the failed node and then notifies the Distributed Lock Manager of the status. Oracle Parallel Server then recovers the database to a valid state.
The Cluster Manager has a subset of functionality known as the "Node Monitor". The Node Monitor polls the status of various resources in a cluster including nodes, interconnect hardware and software, shared disks, and Oracle instances. The means by which the Cluster Manager and its Node Monitor performs these operations is based on Oracle's implementation of the operating system dependent layer.
The Cluster Manager informs clients and the Oracle server when the status of resources within a cluster change. For example, the Oracle server must know when another database instance registers with the Cluster Manager or when an instance disconnects from it.
As mentioned, the Cluster Manager monitors the status of various cluster resources, including nodes, networks and instances. The Node Monitor also serves the Cluster Manager by:
The Distributed Lock Manager is an integrated component of Parallel Server that coordinates simultaneous access to the shared database and to shared resources within the database. It does this to maintain consistency and data integrity. This section describes the following features of the Distributed Lock Manager:
The coordination of access to resources that is performed by the Distributed Lock Manager is transparent to applications. Applications continue to use the same locking mechanisms as are used by the single instance environment.
The Distributed Lock Manager maintains a lock database to record information about resources and locks held on these resources. This lock database resides in memory and is distributed throughout the cluster to all nodes. In this distributed architecture, each node participates in global lock management and manages a portion of the global lock database. This distributed lock management scheme provides fault tolerance and enhanced runtime performance.
The Distributed Lock Manager is fault tolerant in that it provides continual service and maintains the integrity of the lock database even if multiple nodes fail. The shared database is accessible as long as at least one instance is active on that database after recovery completes.
Fault tolerance also enables instances within an Oracle Parallel Server to be started and stopped at any time, in any order. However, instance reconfiguration may cause a brief delay.
The Distributed Lock Manager maintains information about locks on all nodes that need access to a particular resource. The Distributed Lock Manager usually nominates one node to manage all information about a resource and its locks.
Oracle Parallel Server uses a static hashing lock mastering scheme. This mastering process hashes the resource name to one of the Parallel Server instances that acts as the master for the resource. This results in an even, arbitrary distribution of resources among all available nodes. Every resource is associated with a master node.
The Distributed Lock Manager optimizes the method of lock mastering used in each situation. The method of lock mastering affects system performance during normal runtime activity as well as during instance startup. Performance is optimized when a resource is mastered locally.
The Distributed Lock Manager performs deadlock detection to all deadlock sensitive locks and resources. It does not control access to tables or objects in the database itself. Oracle Parallel Server uses the Distributed Lock Manager to coordinate concurrent access across multiple instances to resources such as data blocks and rollback segments.
The Distributed Lock Manager provides persistent resources. Resources maintain their state even if all processes or groups holding a lock on it have died abnormally.
Assume that a node in a cluster needs to modify block number n in the database. At the same time, another node needs to update the same block to complete a transaction.
Without the Distributed Lock Manager, both nodes would simultaneously update the same block. With the Distributed Lock Manager, only one node can update the block; the other node must wait. The Distributed Lock Manager ensures that only one instance has the right to update a block at any one time. This provides data integrity by ensuring that all changes made are saved in a consistent manner.
The Distributed Lock Manager operates independently of the Cluster Manager. The Distributed Lock Manager relies on the Cluster Manager for timely and correct information about the status of other nodes. If the Distributed Lock Manager cannot get the information it needs from a particular instance in the cluster, it shuts down the instance. This ensures the integrity of Oracle Parallel Server databases, as each instance must be aware of all other instances to coordinate disk access.
Oracle Parallel Server derives most of its functional benefits from its ability to run on multiple interconnected machines. Oracle Parallel Server relies heavily on the underlying Inter-Process Communication (IPC) component to facilitate this.
The IPC defines the protocols and interfaces required for the Oracle Parallel Server environment to transfer messages between instances. Messages are the fundamental units of communication in this interface. The core IPC functionality is built around an asynchronous, queued messaging model. IPC is designed to send and receive discrete messages as fast as the hardware allows. With an optimized communication layer, various services can be implemented above it. This is how the Distributed Lock Manager performs its communication duties.
In addition to the operating system dependent layers, Oracle Parallel Server also requires that all nodes must have simultaneous access to the disks. This gives multiple instances concurrent access to the same database.