|Oracle® Database High Availability Best Practices
11g Release 1 (11.1)
|PDF · Mobi · ePub|
An Oracle RAC extended cluster is an architecture that provides extremely fast recovery from a site failure and allows for all nodes, at all sites, to actively process transactions as part of single database cluster. An extended cluster provides greater high availability than a local Oracle RAC cluster, but because the sites are typically in the same metropolitan area, this architecture may not fulfill all disaster recovery requirements for your organization.
The best practices discussed in this section apply to Oracle Database 11g with Oracle RAC on extended clusters, and build on the best practices described in Section 2.4, "Configuring Oracle Database 11g with Oracle RAC".
Use the following best practices when configuring an Oracle RAC database for an extended cluster environment:
The white paper about extended clusters on the Oracle Real Application Clusters Web site at
Oracle Database High Availability Overview for a high-level overview, benefits, and a configuration example
A typical Oracle RAC architecture is designed primarily as a scalability and availability solution that resides in a single data center. To build and deploy an Oracle RAC extended cluster, the nodes in the cluster are separated by greater distances. When configuring an Oracle RAC database for an extended cluster environment, you must:
Configure one set of nodes at Site A and another set of nodes at Site B.
Spread the cluster workload evenly across both sites to avoid introducing additional contention and latency into the design. For example, avoid client/server application workloads that run across sites, such that the client component is in site A and the server component is in site B.
Extended clusters provide the highest level of availability for server and site failures when data centers are in close enough proximity to reduce latency and complexity. The preferred distance between sites in an extended cluster is within a metropolitan area. High internode and interstorage latency can have a major effect on performance and throughput. Performance testing is mandatory to assess the impact of latency. In general, distances of 50 km or less are recommended.
Testing has shown the distance (greatest cable stretch) between Oracle RAC cluster nodes generally affects the configuration, as follows:
Distances less than 10 km can be deployed using normal network cables.
Distances equal to or more than 10 km require DWDM links.
Distances from 10 to 50 km require storage area network (SAN) buffer credits to minimize the performance impact due to the distance. Otherwise, the performance degradation due to the distance can be significant.
For distances greater than 50 km, there are not yet enough proof points to indicate the effect of deployments. More testing is needed to identify what types of workloads could be supported and what the effect of the chosen distance would have on performance.
Oracle recommends host-based mirroring using ASM to internally mirror across the two storage arrays. Implementing mirroring with ASM provides an active/active storage environment in which system write I/Os are propagated to both sets of disks, making the disks appear as a single set of disks that is independent of location. Do not use array-based mirroring because only one storage site is active, which makes the architecture vulnerable to this single point of failure and longer recovery times.
The ASM volume manager provides flexible host-based mirroring redundancy options. You can choose to use external redundancy to defer the mirroring protection function to the hardware RAID storage subsystem. The ASM normal and high-redundancy options allow two-way and three-way mirroring, respectively.
Beginning with Oracle Database Release 11g, ASM includes a preferred read capability that ensures that a read I/O accesses the local storage instead of unnecessarily reading from a remote failure group. When you configure ASM failure groups in extended clusters, you can specify that a particular node reads from a failure group extent that is closest to the node, even if it is a secondary extent. This is especially useful in extended clusters where remote nodes have asymmetric access for performance, thus leading to better usage and lower network loading. Using preferred read failure groups is most useful in extended clusters.
ASM_PREFERRED_READ_FAILURE_GROUPS initialization parameter value is a comma-delimited list of strings that specifies the failure groups that should be preferentially read by the given instance. This parameter is instance specific, and it is generally used only for clustered ASM instances. It's value can be different on different nodes. For example:
See Also:Oracle Database Storage Administrator's Guide for information about configuring preferred read failure groups with the
Add a third voting disk to a third site to host the quorum (voting) diskFoot 7 at a location different from the main sites (data centers).
Most extended clusters have only two storage systems (one at each site). During normal processing, each node writes and reads a disk heartbeat at regular intervals, but if the heartbeat cannot complete, all affected nodes are evicted from the cluster using a forced reboot. Thus, the site that houses the majority of the voting disks is a potential single point of failure for the entire cluster. For availability reasons, you should add a third site that can act as the arbitrator in case either one site fails or a communication failure occurs between the sites.
In some cases, you can also use standard NFS to support a third voting disk on an inexpensive low-end standard NFS mounted device. For more information, see the Oracle Technology Network (OTN) white paper at
If you have an extended cluster and do not configure a third site, you must make one site the primary site and make the other site a secondary site. Then, if the primary site fails, you must manually restart the secondary site.
Consider the following additional factors when implementing an extended cluster architecture:
Network, storage, and management costs increase.
Write performance incurs the overhead of network latency. Test the workload performance to assess impact of the overhead.
Because this is a single database without Oracle Data Guard, there is no protection from data corruption or data failures.
The Oracle release, the operating system, and the clusterware used for an extended cluster all factor into the viability of extended clusters.
When choosing to mirror data between sites:
Host-based mirroring requires a clustered logical volume manager to allow active/active mirrors and thus a primary/primary site configuration. Oracle recommends using ASM as the clustered logical volume manager.
Array-based mirroring allows active/passive mirrors and thus a primary/secondary configuration.
Storage costs for this solution are very high, requiring a minimum of two full copies of the storage (one at each site).
Extended clusters need additional destructive testing, covering
For full disaster recovery, complement the extended cluster with a remote Data Guard standby database, because this architecture:
Footnote LegendFootnote 7: Use standard NFS to support a third voting disk on an extended cluster. You can configure the quorum disk on inexpensive, low end, standard NFS mounted device somewhere on the network. Oracle recommends putting the NFS voting disk on a dedicated server, which belongs to a production environment. See the white paper about using standard Network File System (NFS) to support a third voting disk on a extended cluster configuration that is available on the Oracle Real Application Clusters Web site at