|Oracle9i Real Application Clusters Concepts
Release 2 (9.2)
Part Number A96597-01
This chapter describes the concepts and some of the best practices for implementing high availability in Real Application Clusters. The topics in this chapter are:
Computing environments configured to provide nearly full-time availability are known as high availability systems. Such systems typically have redundant hardware and software that makes the system available despite failures. Well-designed high availability systems do not have single points-of-failure. Any hardware or software component that can fail has a redundant component of the same type.
When failures occur, the failover process moves processing performed by the failed component to the backup component. This process remasters system-wide resources, recovers partial or failed transactions, and restores the system to normal, preferably within a matter of microseconds. The more transparent that failover is to users, the higher the availability of the system.
Oracle offers several products and features that provide high availability. These include Real Application Clusters, Oracle Real Application Clusters Guard I, Oracle Real Application Clusters Guard II, Oracle Replication, and Oracle9i Data Guard. You can use these products in various combinations to meet your specific high availability needs. Real Application Clusters systems are inherently high availability environments that can provide continuous service for both planned and unplanned outages. Real Application Clusters Guard II provides continuous service despite unplanned failures and for online maintenance operations.
Oracle9i Real Application Clusters Real Application Clusters Guard I - Concepts and Administration and Oracle9i Real Application Clusters Guard II Concepts, Installation, and Administration for more information about these features
Real Application Clusters builds higher levels of availability on top of the standard Oracle features. All single instance high availability features such as Fast-Start Recovery and online reorganizations also apply to Real Application Clusters. Fast-Start Recovery can greatly reduce the mean time to recovery (MTTR) with minimal effects on online application performance. Online reorganizations reduce the duration of planned downtime and you can perform many operations online while users update the underlying objects.
In addition to these features, Real Application Clusters exploits the redundancy provided by clustering to deliver availability with n-1 node failures in an n-node cluster. In other words, all users have access to all data as long as there is one available node in the cluster. To configure Real Application Clusters for high availability, consider the hardware and software components of your cluster as described the following section.
This section describes high availability and cluster components in the following sections:
Chapter 2 for more information about these components
Real Application Clusters environments are fully redundant because all nodes access all the database. The failure of one node does not affect another node's ability to process transactions. As long as the cluster has one surviving node, all database clients can process all transactions, although the clients may be subject to increased response times due to capacity constraints on the surviving node.
Interconnect redundancy is often overlooked in clusters. This is because the mean time to failure (MTTF) is generally several years. Therefore, cluster interconnect redundancy might not be a high priority. Also, depending on the system and sophistication level, a redundant cluster interconnect could be cost prohibitive.
However, a redundant cluster interconnect is an important aspect of a fully redundant cluster. Without this, a system is not truly free of single points-of-failure. Cluster interconnects can fail for a variety of reasons and you cannot prevent all of them.
In Real Application Clusters, Oracle executables are installed on either the cluster file system or on the local disks of each node; and at least one instance runs on each node of a cluster. Note that if your platform supports a cluster file system (CFS) and you use it, then only one copy of the Oracle Real Application Clusters software will be installed. All instances have equal access to all data and can process any transactions. In this way, Real Application Clusters ensure full database software redundancy.
Real Application Clusters is primarily a single site, high availability solution. This means the nodes in the cluster generally exist within the same building, if not the same room. Therefore, disaster planning can be critical. Depending on how mission critical your system is, and the potential exposure of your system's location for such disasters, disaster planning can be an important high availability component.
Oracle offers other solutions such as Oracle9i Data Guard to facilitate more comprehensive disaster recovery planning. You can use these solutions with Real Application Clusters where one cluster hosts the primary database and another remote system or cluster hosts the disaster recovery database. However, Real Application Clusters are not required on either site for disaster recovery.
|See Also :
Oracle9i Data Guard Concepts and Administration for more information about Data Guard
Once you have carefully considered your system level issues, validate that your Real Application Clusters environment optimally protects against failures. Use the following list of failure points to plan and troubleshoot your failure protection system:
Real Application Clusters environments protect against cluster component failures and software failures. However, media failures and human error could still cause system downtime. Real Application Clusters, as with single-instance Oracle databases, operates on one set of files. For this reason, you should adopt best practices to avoid the adverse effects of media failures.
RAID-based redundancy practices avoid file loss but might not prevent rare cases of file corruptions. Also, if you mistakenly drop a database object in an Real Application Clusters environment, then you can recover that object the same way you would in a single instance database. These are the primary limitations in an otherwise very robust and highly available Real Application Clusters system.
Once you deploy your system, the key issue is the transparency of failover and its duration as described in the following section.
This section describes the principles of failover and the features Real Application Clusters offers to implement failover in high availability systems. Topics in this section include:
Failover requires that highly available systems have accurate instance monitoring or heartbeat mechanisms. In addition to having this functionality for normal operations, the system must be able to quickly and accurately synchronize resources during failover.
The process of synchronizing, or remastering, requires the graceful shutdown of the failing system as well as an accurate assumption of control of the resources that were mastered on that system. In Real Application Clusters, your system records resource information to remote nodes as well as local. This makes the information needed for failover and recovery available to the recovering instances.
The duration of failover includes the time a system requires to remaster system-wide resources and the time to recover from failures. The duration of the failover process can be a relatively short interval on certified platforms.
It is important to hide system failures from database client connections. Such connections can include application users in client server environments or middle-tier database clients in multitiered application environments. Properly configured failover mechanisms transparently reroute client sessions to an available node in the cluster. This capability in the Oracle database is referred to as Transparent Application Failover.
Transparent Application Failover (TAF) enables an application user to automatically reconnect to a database if the connection fails. Active transactions roll back, but the new database connection, which is achieved using a different node, is identical to the original. This is true regardless of how the connection fails.
There are several elements associated with active database connections. These include:
Transparent Application Failover automatically restores some of these elements. For example, during normal client/server database operations, a client maintains a connection to the database so the client and server can communicate. If the server fails, then so does the connection. The next time the client tries to use the connection the client issues an error. At this point, the user must log in to the database again.
With Transparent Application Failover, however, Oracle automatically obtains a new connection to the database. This enables users to continue working as if the original connection had never failed. Therefore, with Transparent Application Failover, a client notices no connection loss as long as one instance remains active to serve the application.
Oracle9i Net Services Administrator's Guide for background and configuration information about Transparent Application Failover
While the ability to fail over client sessions is an important benefit of Transparent Application Failover, there are other useful scenarios where Transparent Application Failover improves system availability. These topics are discussed in the following subsections:
It is sometimes necessary to take nodes out of service for maintenance or repair. For example, if you want to apply patch releases without interrupting service to application clients. Transactional shutdowns facilitate shutting down selected nodes rather than an entire database. Two transactional shutdown options are available:
TRANSACTIONALclause of the
SHUTDOWNstatement to remove a node from service so that the shutdown event is deferred until all existing transactions are completed. In this way, client sessions can be migrated to another node of the cluster at transaction boundaries.
TRANSACTIONAL LOCALclause of the
SHUTDOWNstatement to perform transactional shutdown on a specified local instance. You can use this statement to prevent new transactions from starting locally, and to perform an immediate shutdown after all local transactions have completed. With this option, you can gracefully move all sessions from one instance to another by shutting down selected instances transactionally.
After performing a transactional shutdown, Oracle routes newly submitted transactions to an alternate node. An immediate shutdown is performed on the node when all existing transactions complete.
You may need to perform administrative tasks that require isolation from concurrent user transactions or queries. To do this, you can use the quiesce database feature. This prevents you, for example, from having to shut down the database and re-open it in restricted mode to perform such tasks.
To do this, you can use the
ALTER SYSTEM statement with the
QUIESCE RESTRICTED clause. The QUIESCE RESTRICTED clause enables you to perform administrative tasks in isolation from concurrent user transactions or queries.
You cannot open the database on one instance if the database is being quiesced on another node. In other words, if you issued the
Oracle9i Real Application Clusters Administration and the Oracle9i Database Administrator's Guide for more detailed information about the quiesce database feature and Oracle9i SQL Reference for more information about the
A database is available when it processes transactions in a timely manner. When the load exceeds a node's capacity, client transaction response times are adversely affected and the database availability is compromised. It then becomes important to manually migrate client sessions to a less heavily loaded node to maintain response times and application availability.
In Real Application Clusters, the Transport Network Services (TNS) listener files provide automated load balancing across nodes in both shared server and dedicated server configurations. Because the parameters that control cross-instance registration are also dynamic, Real Application Clusters' load balancing feature automatically adjusts for cluster configuration changes. For example, if you add a node to your cluster database, then Oracle updates all the listener files in the cluster with the new node's listener information.
Failover processing for query clients is different than the failover processing for Database Manipulation Language clients. The important issue during failover operations in either case is that the failure is masked from existing client connections as much as possible. The following subsections describe both types of failover processing.
At failover, in-progress queries are reissued and processed from the beginning. This might extend the duration of the next query if the original query required longer to complete. With Transparent Application Failover (TAF), the failure is masked for query clients with an increased response time being the only issue affecting the client. If the client query can be satisfied with data in the buffer cache of the surviving node to which the client reconnected, then the increased response time is minimal. Using TAF's PRECONNECT method eliminates the need to reconnect to a surviving instance and thus further minimizes response time. However,
PRECONNECT allocates resources awaiting the failover event.
After failover, server-side recovery must complete before access to the datafiles is allowed. The client transaction experiences a system pause until server-side recovery completes, if server-side recovery has not already completed.
You can also use a callback function through an OCI call to notify clients of the failover so that the clients do not misinterpret the delay for a failure. This prevents the clients from manually attempting to reestablish connections.
Database Manipulation Language (DML) database clients perform INSERT, UPDATE, and DELETE operations. Oracle handles certain errors and performs a reconnect when those errors occur.
Without this application code, INSERT, UPDATE, and DELETE operations on the failed instance return an un-handled Oracle error code. Upon re-submission, Oracle routes the client connections to a surviving instance. The client transaction then stops only momentarily until server-side recovery completes.
Queries that cross the network after shutdown processing completes will failover. However, Oracle returns an error for queries that are in progress during shutdowns. Therefore, TAF only operates when the operating system returns a network error and the instance is completely down.
Applications that use TAF for transactional shutdown must be written to process the error ORA-01033 "ORACLE initialization or shutdown in progress". In the event of a failure, an instance will return error ORA-01033 once shutdown processing begins. Such applications need to periodically retry the failed operation, even when Oracle reports multiple ORA-01033 errors. When shutdown processing completes, TAF recognizes the failure of the network connection to instance and restores the connection to an available instance.
Connection load balancing improves connection performance by balancing the number of active connections among multiple dispatchers. In single-instance Oracle environments, the listener selects the least loaded dispatcher to manage incoming client requests. In Real Application Clusters environments, connection load balancing also has the capability of balancing the number of active connections among multiple instances.
Due to dynamic service registration, a listener is always aware of all of the instances and dispatchers regardless of their locations. Depending on the load information, a listener determines to which instance and to which dispatcher to send incoming client requests if you are using the shared server configuration.
In shared server configurations, listeners select dispatchers using the following criteria in the order shown:
In dedicated server configurations, listeners select instances in the following order:
If a database service has multiple instances on multiple nodes, then the listener chooses the least loaded instance on the least loaded node. If you have configured the shared server, then the least loaded dispatcher of the selected instance is chosen.
Oracle9i Net Services Administrator's Guide for more information about load balancing
When a connection fails, you might experience the following:
SESSIONstatements are lost
FAILOVER_MODEsection of the service name description, Oracle also attempts to fail over the query
If the first command after failover is not a SQL SELECT or OCIStmtFetch statement, then an error message results. Failover only takes effect if the application is programmed with OCI release 8.0 or greater.
Server-side failover processing in Real Application Clusters is different from host-based failover solutions that are available on many server platforms. The following subsections describe both types of failover processing.
Real Application Clusters provides rapid server-side failover. This is accomplished by the concurrent, active-active architecture in Real Application Clusters. In other words, multiple Oracle instances are concurrently active on multiple nodes and these instances synchronize access to the same database.
All nodes also have concurrent ownership and access to all disks. When one node fails, all other nodes in the cluster maintain access to all the disks; there is no disk ownership to transfer, and database application binaries are already loaded into memory.
Depending on the size of the database, the duration of failover can vary. The larger the database, or the greater the size of its datafiles, the greater the failover benefit of using Real Application Clusters.
Many operating system vendors and other cluster software vendors offer high availability application failover products. These failover solutions monitor application services on a given primary cluster node. They then fail over services to a secondary cluster node as needed. Host-based failover solutions generally have one active instance performing useful work for a given database application. The secondary node monitors the application service on the primary node and initiates failover when the primary node service is unavailable.
Failover in host-based systems usually includes the following steps.
The following subsections describe server failover recovery processing in Real Application Clusters:
Real Application Clusters relies on the Cluster Manager software for failure detection because the Cluster Manager maintains the heartbeat functions. The time it takes for the Cluster Manager to detect that a node is no longer in operation is a function of a configurable heartbeat timeout parameter.
The use of this parameter varies, depending on your platform. Defaults can vary significantly, depending on the clusterware you use, such as Sun Cluster or the Hewlett-Packard Service Guard OPS Edition. The parameter value is inversely related to the number of false failure detections because the cluster might incorrectly determine that a node is failing due to transient failures if the timeout interval is set too low. When a failure is detected, cluster reorganization occurs.
When a node fails, Oracle must alter the node's cluster membership status. This is known as a cluster reorganization and it usually happens quickly. The duration of cluster reorganization is proportional to the number of surviving nodes in the cluster.
The Global Cache Service (GCS) and Global Enqueue Service (GES) provide the Cluster Manager interfaces to the software and expose the cluster membership map to the Oracle instances when nodes are added or deleted from the cluster. The LMON process on each cluster node communicates with the Cluster Manager on the respective node and exposes that information to the respective instances. LMON also provides two more useful functions by:
If a node fails to perform these two functions, then other nodes consider that node to no longer be a member of the cluster. Such a failure causes a change in a node's membership status within the cluster. Then LMON initiates recovery actions that include remastering of the Global Cache Service (GCS) and Global Enqueue Service (GES) resources and instance recovery.
At this stage, the Real Application Clusters environment is in a state of system pause, and client transactions that do not have the needed resources to complete will suspend until Oracle completes recovery processing. Other in-progress transactions, however, continue processing.
The process of instance membership recovery (IMR) guarantees that all members of a cluster are functional by:
All instances read the CFVRR. If a member is not in the membership map, then IMR assumes a node has expired. Appropriate diagnostic information is provided. As IMR is currently configured, all members wait indefinitely for notification of node expiration. There is no forced removal of instances. Part of the fault tolerance of Real Application Clusters is a provision for the possibility that the IMR arbiter itself could fail.
When an instance fails, Oracle must remaster the GCS resources from the failed instance onto the surviving cluster nodes and perform instance recovery as discussed in the following sections:
The time required for remastering resources is proportional to the number of GCS resources in the failed instance. This number in turn depends upon the size of the buffer caches.
During this phase, all resources previously mastered at the failed instance are redistributed across the remaining instances. These resources are reconstructed at their new master instance. All other resources previously mastered at surviving instances are not affected. For any resource request, there is a 1/n chance that the request will be satisfied locally and a (n-1)/n chance that the request involves remote operations.
In the case of a cluster database having only one surviving instance, all resource operations are satisfied locally. Once the remastering of a failed instance's GCS resource completes, Oracle recovers the in-progress transactions of the failed instance. This is known as instance recovery.
Instance recovery includes cache recovery and transaction recovery. Instance recovery requires that an active Real Application Clusters instance detects failure and performs recovery processing for the failed instance. The first Real Application Clusters instance that detects the failure, using its LMON process, controls the recovery of the failed instance by taking over its redo log files and performing instance recovery. This is why the redo log files must be on either a cluster file system file or on a shared raw device.
Instance recovery is complete when Oracle has replayed the online redo log files of the failed instance. Because Oracle can perform transaction recovery in a deferred fashion, any suspended client transactions can begin processing when cache recovery is complete.
Oracle9i Recovery Manager User's Guide and Reference for a description of Block Media Recovery (BMR)
For cache recovery, Oracle replays the online redo logs of the failed instance. You can also make Oracle perform cache recovery using parallel execution so that parallel processes, or threads, replay the redo logs of the failed Oracle instance. It could also be important that you keep the time interval for redo log replay to a predictable duration. The Fast-Start Recovery feature in Oracle9i enables you to control this.
Oracle also provides nonblocking rollback capabilities. This means that full database access can begin as soon as Oracle has replayed the online log files. After cache recovery completes, Oracle begins transaction recovery.
Oracle9i Database Performance Guide and Reference for more information on how to use Fast-Start Recovery
Transaction recovery comprises rolling back all uncommitted transactions of the failed instance. Uncommitted transactions are in-progress transactions that did not commit.
The Oracle9i Fast-Start Rollback feature performs this as deferred processing that runs in the background. Oracle uses a multiversion read consistency technology to provide on-demand rollback of only those rows blocked by expired transactions. This enables new transactions to progress with minimal delay. New transactions do not have to wait for long-running expired transactions to be rolled back. Therefore, large transactions generally do not affect database recovery time.
Just as with cache recovery, Oracle9i Fast-Start Rollback rolls back expired transactions in parallel. However, single-instance Oracle databases roll back expired transactions using the CPU of one node.
Real Application Clusters provides cluster-aware Fast-Start Rollback capabilities that use all the CPU nodes of a cluster to perform parallel rollback operations. Each cluster node spawns a recovery coordinator and recovery processes to assist with parallel rollback operations. The Fast-Start Rollback feature is thus cluster aware because the database is aware of and uses all cluster resources for parallel rollback operations.
While the default behavior is to defer transaction recovery, you could choose to configure your system so that transaction recovery completes before allowing client transactions to progress. In this scenario, the ability of Real Application Clusters to parallelize transaction recovery across multiple nodes is a more visible user benefit.
This section discusses the following Real Application Clusters high availability configurations:
The Real Application Clusters n-node configuration is the default environment. All nodes of the cluster participate in client transaction processing and client sessions can be load balanced at connect time. Response time is optimized for available cluster resources, such as CPU and memory, by distributing the load across cluster nodes to create a highly available environment.
In the event of node failures, an instance on another node performs the necessary recovery actions. The database clients on the failed instance can be load balanced across the surviving (n-1) instances of the cluster. The increased load on each of the surviving instances can be minimized and availability increased by keeping response times within acceptable bounds. In this configuration, the database application workload can be distributed across all nodes and therefore provide optimal use of cluster machine resources.
You can easily configure a basic high availability system for Real Application Clusters in two-node environments. The primary instance on one node accepts user connections while the secondary instance on the other node accepts connections when the primary node fails, or when specifically selected through the
INSTANCE_ROLE parameter. You can configure this manually by controlling the routing of transactions to specific instances. However, Real Application Clusters provides the Primary/Secondary Instance Configuration feature to accomplish this automatically.
Configure the Primary/Secondary Instance feature by setting the initsid.ora parameter
1. In a two-node environment, the instance that first mounts the database assumes the primary instance role. The other instance assumes the role of secondary instance. If the primary instance fails, then the secondary instance assumes the primary role. When the failed instance returns to active status, it assumes the secondary instance role.
The secondary instance becomes the primary instance only after the Cluster Manager informs it about the failure of the primary instance. This occurs before GCS and GES reconfiguration and cache and transaction recovery processes begin. The redirection to the surviving instance happens transparently; application programming is not required. You only need to make minor configuration changes to the client connect strings.
In Primary/Secondary Instance configurations, both instances run concurrently, as in any n-node Real Application Clusters environment. However, database application users only connect to the designated primary instance. The primary node masters all of the GCS and GES resources. This minimizes communication between the nodes and provides performance levels that are nearly comparable to traditional single instance databases.
The secondary instance can be used by specially configured clients, known as administrative clients, for batch query reporting operations or database administration tasks. This enables some level of utilization of the second node. It also helps off-load CPU capacity from the primary instance and justify the investment in redundant nodes.
The Primary/Secondary Instance configuration works in both dedicated server and shared server environments. However, it functions differently in each as described in the following sections:
In current high availability configurations, dedicated server environments do not use cross-instance listener registration. Connection requests made to a specific instance's listener can only be connected to that instance's service. This behavior is similar to the default n-node configuration in dedicated server environments.
Figure 10-1 shows a cluster configuration before a node failure.
When the primary instance fails, as shown in Figure 10-2, the following steps occur:
Real Application Clusters provides reconnection performance benefits when running in shared server mode. This is accomplished by the cross-registration of all the dispatchers and listeners in the cluster.
In the Primary/Secondary configurations, the primary instance's dispatcher registers as the primary instance with both listeners, as shown in Figure 10-3:
Oracle9i Real Application Clusters Setup and Configuration for information about configuring client connect strings
Specially configured clients can use the secondary instance for batch operations. For example, batch reporting tasks or index creation operations can be performed on the secondary instance.
Oracle9i Real Application Clusters Administration for instructions about how to connect to secondary instances
Figure 10-4 shows how a failed primary instance is replaced by a new primary instance.
Maintaining information about frequently executed SQL and PL/SQL statements in the library cache improves the performance of the Oracle database server. In Real Application Clusters primary and secondary instance configurations, the library cache associated with the primary instance contains up-to-date information. If failover occurs, then the benefit of that information is lost unless the library cache on the secondary instance is populated beforehand.
DBMS_LIBCACHE package to transfer information in the library cache of the primary instance to the library cache of the secondary instance. This process is called warming the library cache. It improves performance immediately after failover because the new primary library cache does not need to be populated with parsed SQL statements and compiled PL/SQL units.
Oracle9i Real Application Clusters Real Application Clusters Guard I - Concepts and Administration for more information about installing and configuring the library cache warming feature and Oracle9i Supplied PL/SQL Packages and Types Reference for more information about using
There are several reasons for using the Primary/Secondary Instance feature for this scenario instead of a default two-node configuration. The Primary/Secondary Instance feature provides:
Operating Real Application Clusters in an n-node configuration optimally utilizes cluster resources. However, as discussed previously, this is not always possible or advisable. On the other hand, the financial investment required to have an idle node for failover might be prohibitive. These situations might instead be best suited for a shared high availability node configuration.
This type of configuration typically has several nodes each running a separate application module or service where all application services share one Real Application Clusters database. In addition, you can configure a separate designated node as a failover node. While an instance is running on that node, no users are connected to it during normal operations. In the event that one of the nodes fails, Oracle can redirect the workload to the failover node.
While this configuration is useful for applications that need to run on separate nodes, it works best if a middle-tier application or transaction processing monitor directs the appropriate application users to the appropriate nodes. Unlike the Primary/Secondary Instance Configuration, there is no database setup that automates the workload transition to the failover node. Instead, the application or middle-tier software directs users from the failed application node to the failover node. The application also must control failing back the users once the failed node is operational. Failing back frees the failover node for processing user work from subsequent node failures.
In this configuration, application performance is maintained in the event of a failover. In the n-node configuration, application performance could degrade by 1/n due to the same workload being redistributed over a smaller set of cluster nodes.
High availability as well as improved manageability is available with Real Application Clusters Guard II which is a full instance environment that enables you to control all the instances on which services run as well as their failover properties. On failure, Real Application Clusters Guard II transfers application service loads to other available nodes without service interruptions.
Oracle9i Real Application Clusters Guard II Concepts, Installation, and Administration on the Real Application Clusters Guard II software CD for more information about Real Application Clusters Guard II
Real Application Clusters provides a fully redundant fault resilient environment. All cluster nodes have an active instance that has equal access to all data and resources. If a node fails, then users can access the data using a surviving instance on another node. In-progress transactions on the failed node are recovered by the first node that detects the failure. In this way, there is minimal interruption to end-user application availability with Real Application Clusters.