|Oracle Parallel Server Getting Started
Release 8.0.4 for Windows NT
This chapter provides a conceptual and component overview of Oracle Parallel Server. This information helps you prepare and plan your Oracle Parallel Server installation.
Specific topics discussed are:
Oracle Parallel Server is an architecture that allows multiple instances to access a shared database. Oracle Parallel Server offers the following (terms will be described later in this chapter):
An Oracle Parallel Server can handle node or disk failure in a clustered environment with no or minimal downtime. The Oracle Parallel Server architecture provides the following features:
Coordination of each node accessing the shared database provides the following:
The following components make up an Oracle Parallel Server:
Oracle8 Enterprise Edition
Provides the applications and files to manage a database. All other Oracle Parallel Server components are layered on top of Oracle8 Enterprise Edition.
Oracle Parallel Server Option Server
Provides the necessary Oracle Parallel Server scripts and files to create and support an Oracle Parallel Server. This option also installs the Group Membership Service.
Oracle Parallel Server Manager
Additional Information: See Chapter 6, "Installing and Configuring Oracle Parallel Server Manager", and "Managing Instances Using OPSM" in Chapter 7, "Administering Multiple Instances".
Group Membership Service (PGMS)
Additional Information: See the "Group Membership Service" subsection.
Operating System Dependent (OSD) layer
Additional Information: See the "Operating System Dependent Layer" subsection.
The Group Membership Service (PGMS) called OraclePGMSService on each node monitors what groups (or domains) are up and its instance members. OraclePGMSService interacts with the Cluster Manager - the vendor software that manages access to shared disks and monitors the status of various cluster resources (including nodes, networks, and the PGMS). The Cluster Manager provides a node monitor service (as described in "Cluster Manager" in this chapter), which each PGMS uses to determine the status of PGMS instances on other nodes. Each instance attaches to the OraclePGMSService during instance startup. An instance detaches from the OraclePGMSService during shutdown. Figure 1-1 depicts how PGMS coordinates between Cluster Manager and an instance.
A vendor-supplied Operating System Dependent (OSD) layer that passed certification must be installed after Oracle Parallel Server Option is installed. The OSD layer consists of several software components developed by vendors. The OSD layer maps the key OS/cluster-ware services required for proper operation of Oracle Parallel Server.
The OSD layer consists of:
Cluster Manager (CM)
Discovers the state of the cluster.
Inter-Process Communication (IPC)
Provides reliable transfer of messages between instances on different nodes.
Provides I/O to access shared disks.
Provides one-time configuration to startup functionality.
Performance and Management (P&M)
Supports external performance and management tools.
These components provide key services required for proper operation of the Oracle Parallel Server Option and are used by various clients, such as PGMS and Integrated Distributed Lock Manager (IDLM). Each OSD module interacts with the Oracle Parallel Server runtime environment as a single Dynamic Link Library (DLL). These components are more fully described later in this chapter.
Figure 1-2 illustrates the OSD components in a cluster with two nodes:
The Cluster Manager (CM) component:
It is critical that all Oracle Parallel Server instances receive the same membership information when events occur. Notification changes cause relevant Oracle Parallel Server recovery operations to be initiated. If any node is determined to be dead or otherwise not a properly functioning part of the system, CM terminates all processes on that node. Thus, any process or thread running Oracle code can safely assume its node is an active member of the system.
If there is a failure, recovery is transparent to user applications. CM automatically reconfigures the system to isolate the failed node and notify PGMS of the status. PGMS then notifies the IDLM. The IDLM subsequently recovers any of the locks from the failed node. Oracle Parallel Server can then recover the database to a valid state.
The IDLM relies on the Cluster Manager for timely and correct information. If the IDLM cannot get the information it needs, it will shut down the instance.
Oracle Parallel Server derives most of its functional benefits from its ability to run on multiple interconnected machines. Oracle Parallel Server relies heavily on the underlying Inter-Process Communication (IPC) component to facilitate this.
IPC defines the protocols and interfaces required for the Oracle Parallel Server environment to transfer reliable messages between instances. Messages are the fundamental logical units of communication in this interface. The core IPC functionality is built around an asynchronous, queued messaging model. IPC is designed to send/receive discrete messages as fast as the hardware allows. With an optimized communication layer, various services can be implemented above it. This is how the IDLM carries out all of its communication.
The Input/Output (IO) component provides interprocess capabilities that a cluster implementation must support to enable proper operation of the Oracle Parallel Server environment.
The Oracle Parallel Server environment is extremely dependent on the ability of the underlying OS/cluster implementation to support simultaneous disk sharing across all nodes that run coordinated Oracle Parallel Server instances. Unlike switch-over based technologies, all Oracle Parallel Server instances are active and can operate on any database entity in the shared physical database simultaneously. It is this capability that gives Oracle Parallel Server a large portion of its parallel scalability. It is the role of the IDLM to coordinate the simultaneous access to shared databases in a way that maintains consistency and data integrity.
At a high level, the Oracle Parallel Server shared I/O model can be described as a distributed disk cache implemented across all nodes that define the Oracle Parallel Server cluster. The core of Oracle Parallel Server can be viewed as a major client of the cache. Disk blocks from the shared devices are read into a particular node instance cache only after mediation by the IDLM. The other node instance may read the same blocks into its cache and operate on them simultaneously. Updates to those blocks are carefully coordinated. In general, all shared disk based I/O operations are mediated by the IDLM. The set of distributed IDLMs on each node can be thought of as managing the distributed aspects of the cache.
Disk update operations must be carefully coordinated so that all nodes see the same data in a consistent way. Any Oracle Parallel Server instance intending to update a cached data block must enter into a dialog with the IDLM to ensure it has exclusive right to update the block. Once it does this, the instance is free to update the block until its rights have been revoked by the IDLM. When the exclusive update right is revoked, the instance with block updates must write the block to disk so that the other node can see the changes. Given this rather high-level view of the IDLM I/O consistency model, it is clear that disk blocks can migrate around to each instance's block cache and all updates are flushed to disk when an instance other than the owner desires access to the block. It is this property that directly determines the reliance of Oracle Parallel Server on shared disk implementations.
The Startup (START) component initiates the Oracle Parallel Server components in a specific order during instance startup. It is up to the vendor to determine this startup sequence.
The Performance and Management (P&M) component defines the way Oracle Parallel Server clients are configured and the relevant information an OSD implementation must provide to allow the Oracle management tools to operate properly. The P&M component also defines the way OSD modules provide performance analysis information suitable for OPSM.
Oracle Parallel Server instances coordinate with the following components:
A Windows NT server where an instance resides.
A set of physically interconnected nodes, and a shared disk storage subsystem.
The set of all instances coordinating together. A domain is limited to a set of Oracle Parallel Server instances that run on only the nodes defined within a cluster. the domain is defined by the DB_NAME in the INIT_COM.ORA file.
Oracle Parallel Server allows multiple instances to coordinate data manipulation operations on a common database. An Oracle Parallel Server instance is defined as a process and a set of threads and memory structures required for proper database operation. The pool of threads that make up an instance coordinate their work on the database through shared memory by means of the instance Shared Global Area (SGA). Users can connect to any instance to access the information that resides within the shared database.
The database files are located on disk drives that are shared between the multiple nodes. If one node fails, client applications (written to do so) can re-route users to another node. One of the surviving nodes automatically performs recovery by rolling back any incomplete transactions that the other node was attempting. This ensures the logical consistency of the database.
An instance does not include database files. This means you can start up an instance without mounting the database files.
Each instance has a unique:
Different instances on different nodes can have the same system ID (SID). Oracle Corporation recommends each node in the domain having a unique SID to identify its instance. For example, the first node, called the primary node, uses the SID OPS1; the second node uses the SID OPS2 to identify its instance; and so on.
All nodes have the same components. The primary node is simply just the first node in cluster.
All instances share:
An instance contains:
For more information on Oracle8 database processes and memory structures, see Oracle8 Concepts.
A database is logically divided into tablespaces that contain all data stored in the database. Tablespaces, in turn, are made up of one or more data files.
With Oracle Parallel Server, all participating instances access the same database files.
Figure 1-3 shows the relationship between two Oracle instances and the shared disks on which the database files are stored:
Oracle Parallel Server has the following features:
Integrated Distributed Lock Manager (IDLM)
Maintains a list of system resources and provides locking mechanisms to control allocation and modification of Oracle resources. Every process interested in the database resource protected by the IDLM must open a lock on the resource.
Additional Information: See the "Integrated Distributed Lock Manager" subsection.
Parallel Cache Management (PCM)
Provides instance locks (with minimal use of the IDLM) that cover one or more data blocks of any class: data block, index blocks, undo blocks, segment headers, and so on. Oracle Parallel Server uses these instance locks to coordinate access to shared resources. The IDLM maintains the status of the instance locks
Additional Information: See the "Parallel Cache Management" subsection.
Oracle Parallel Query (OPQ)
Additional Information: See the "Oracle Parallel Query" subsection.
The Integrated Distributed Lock Manager (IDLM) maintains a list of system resources and provides locking mechanisms to control allocation and modification of Oracle resources. Resources are structures of data. The IDLM does not control access to tables or anything in the database itself. Every process interested in the database resource protected by the IDLM must open a lock on the resource.
Oracle Parallel Server uses the IDLM facility to coordinate concurrent access to resources, such as data blocks and rollback segments, across multiple instances. The Integrated Distributed Lock Manager facility has replaced the external Distributed Lock Manager which was used in earlier releases of Oracle Server.
The IDLM uses the LMON and LMDn processes. LMON manages instance and processes deaths and associated recovery for the IDLM. In particular, LMON handles the part of recovery associated with global locks. The LMDn process handles remote lock requests (those which originate from another instance).
The IDLM is a resource manager and, thus, does not control access to the database.
A node in a cluster needs to modify block number n in the database file. At the same time, another node needs to update the same block n to complete a transaction.
Without the IDLM, both nodes update the same block at the same time. With the IDLM, only one node is allowed to update the block. The other node must wait. The IDLM ensures that only one instance has the right to update a block at any one time. This provides data integrity by ensuring that all changes made are saved in a consistent manner.
The IDLM uses PGMS to determine which instances are active. When the instance is started, the LMON and LMDn processes are started and the IDLM registers with PGMS. The IDLM deregisters with PGMS when the database is shutdown.
Parallel Cache Management (PCM) provides instance locks (with minimal use of the IDLM) that cover one or more data blocks of any class: data block, index blocks, undo blocks, segment headers, and so on. Oracle Parallel Server uses these instance locks to coordinate access to shared resources. The IDLM maintains the status of the instance locks.
PCM locks ensure cache coherency by forcing instances to acquire a lock before modifying or reading any database block. PCM locks allow only one instance at a time to modify a block. If a block is modified by an instance, the block must first be written to disk before another instance can acquire the PCM lock, read the block, and modify it.
If node 1 needs access to data that is currently in node 2's buffer cache, node 1 can submit a request to the IDLM. Node 2 then writes the needed blocks to disk. Only then is Node 1 notified by the IDLM to read updated and consistent data from the disk.
You use the initialization parameter GC_FILES_TO_LOCKS to specify the number of PCM locks which cover the data blocks in a data file or set of data files. The smallest granularity is one PCM lock per data block; this is the default. PCM locks usually account for the greatest proportion of instance locks in a parallel server.
PCM locks are implemented in four ways
The first instance which starts up creates an IDLM resource and an IDLM lock (in null mode) on the IDLM resource for each hashed PCM lock. The first instance initializes each lock. The instance then proceeds to convert IDLM locks to other modes as required. When a second instance requires a particular IDLM lock, it waits until the lock is available and then converts the lock to the mode required. The total number of locks that can be allocated is limited by system resources. This usually means that multiple blocks have to be covered by the same lock. In other words, there is a low lock granularity. This might result in false pinging. The startup of the instance also requires more time, since all the lock resources have to be allocated at startup time.
Typically, hashed locks are never released; each will stay in the mode in which it was last requested. If the lock is required by another. You can, however, specify releasable hashed locks by using the R option with the GC_FILES_TO_LOCKS parameter. Releasable hashed PCM locks are taken from the pool of GC_RELEASABLE_LOCKS.
With fine-grain locking, locks are dynamically allocated at block-access time. The resources for the lock are only allocated during the time the lock is needed and are released when the lock is released. This makes it possible to achieve very high-lock granularity. If resource minimization is the goal, fine-grain locks can also cover multiple blocks, but are still allocated dynamically.
Since locks are allocated only as required, the instance can start up much faster than with hashed locks. An IDLM resource is created and an IDLM lock is obtained only when a user actually requests a block. Once a fine-grain lock has been created, it can be converted to various modes as required by various instances.
Typically, fine-grain locks are releasable: An instance can give up all references to the resource name during normal operation. You can, however, allocate fixed locks in a fine grained manner with the GC_FILES_TO_LOCKS parameter. Creating a 1 to 1 ratio of locks to blocks creates DBA locking.
It is possible to have both fine-grain locking and hashed locking enabled at the same time.
Below is a comparison of both PCM locks.
|Hash PCM Locks
|Fine-Grain PCM Locks
Use the table below to choose a PCM lock:
|When to use hashed locks...
|When to use fine-grain locks...
With the Oracle Parallel Query (OPQ), Oracle can divide the work of processing certain types of SQL statements among multiple query server processes.
When parallel execution is not being used, a single server thread performs all necessary processing for the sequential execution of a SQL statement. For example, to perform a full table scan (such as SELECT * FROM EMP), one thread performs the entire operation.
OPQ performs the operations in parallel using multiple parallel processes. One process, known as the parallel coordinator, dispatches the execution of a statement to several parallel server processes and coordinates the results from all the server processes to send the results back to the user.
The parallel coordinator breaks down execution functions into parallel pieces and then integrates the partial results produced by the parallel server processes. The number of parallel server processes assigned to a single operation is the degree of parallelism for an operation. Multiple operations within the same SQL statement all have the same degree of parallelism
Oracle Parallel Server provides the framework for the Parallel Query Option to work between nodes. The OPQ behaves the same way in Oracle with or without the Parallel Server Option. The only difference is that Oracle Parallel Server enables OPQ to ship queries between nodes so that multiple nodes can execute on behalf of a single query. Here, the server breaks the query up it into smaller operations that run against a common database which resides on shared disks. Because it is performed by the server, this parallelism can occur at a low level of server operation, rather than at an external SQL level.
In some applications, an individual query often consumes a great deal of CPU resource and disk I/O (unlike most online insert or update transactions). To take advantage of multi-processing systems, the data server must parallelize individual queries into units of work which can be processed simultaneously.
If the query were not processed in parallel, disks would be read serially with a single I/O. A single CPU would have to scan all rows in a table. With the query parallelized, disks are read in parallel, with multiple I/Os.
Several CPUs can each scan a part of the table in parallel, and aggregate the results. Parallel query benefits not only from multiple CPUs but also from greater I/O bandwidth availability.
OPQ can run with or without the Oracle Parallel Server. Without the Oracle Parallel Server option, OPQ cannot perform multi-node parallelism. Oracle Parallel Server optimizes Oracle8 Enterprise Edition running on clustered hardware, using a parallel cache architecture to avoid shared memory bottlenecks in OLTP and decision support applications.
OPQ within Oracle Parallel Server performs parallelism within a node and among nodes via the parallel query slave processes on each node.
A sample SQL statement is shown below:
After you have run a query, you can use the information derived from V$PQ_SYSSTAT to view the number of slave processes used, and other information for the system.