11 Overview of Oracle RAC and Clusterware Best Practices

Oracle Clusterware and Oracle Real Application Clusters (RAC) are Oracle's strategic high availability and resource management database framework in a cluster environment, and an integral part of the Oracle MAA Silver reference architecture.

Adding Oracle RAC to a Bronze MAA reference architecture elevates it to a Silver MAA reference architecture. The Silver MAA reference architecture is designed for databases that can’t afford to wait for a cold restart or a restore from backup, should there be an unrecoverable database instance or server failure.

The Silver reference architecture has the potential to provide zero downtime for node or instance failures, and zero downtime for most database and system software updates, that are not achievable with the Bronze architecture. To learn more about the Silver MAA reference architecture, see High Availability Reference Architectures.

Oracle Clusterware and Oracle RAC provide the following benefits:

  • High availability framework and cluster management solution
    • Manages resources, such as Virtual Internet Protocol (VIP) addresses, databases, listeners, and services
    • Provides HA framework for Oracle database resources and non-Oracle database resources, such as third party agents
  • Active-active clustering for scalability and availability

    • High Availability If a server or database instance fails, connections to surviving instances are not affected; connections to the failed instance quickly failover to surviving instances that are already running and open on other servers in the Oracle RAC cluster
    • Scalability and Performance Oracle RAC is ideal for high-volume applications or consolidated environments where scalability and the ability to dynamically add or re-prioritize capacity across more than a single server are required. An individual database may have instances running on one or more nodes of a cluster. Similarly, a database service may be available on one or more database instances. Additional nodes, database instances, and database services can be provisioned online. The ability to easily distribute workload across the cluster makes Oracle RAC the ideal complement for Oracle Multitenant when consolidating many databases.

The following table highlights various Oracle Clusterware and Real Application Cluster configuration best practices.

Table 11-1 Oracle RAC HA Use Cases and Best Practices

Use Case Best Practices
Certified and validated Clusterware software stack

Use Oracle Clusterware and avoid third-party Clusterware.

See Oracle Database Clusterware Administration and Deployment Guide

Clusterware is built-in to all Oracle Exadata Systems.

Certified and validated storage architecture

Use Oracle Automatic Storage Management (Oracle ASM) and Oracle ASM Cluster File System (Oracle ACFS) instead of third party volume managers and cluster file systems for the following MAA benefits:

  • Eliminate hot spots by distributing work across all disks
  • Scale and adjust storage capacity by adding and dropping disks and storage online
  • Reduce complexity by providing a simplified and uniform method (ASMCMD, ASMCA, ExaCLI, or oeadacli) to manage database storage
  • Inherent data corruption detection and repair when using ASM diskgroups
  • Simple management, patching and maintenance with an integrated Oracle Grid Infrastructure (Clusterware +ASM) without any additional drivers

When using ASM with external redundancy, ensure that the underlying storage and network is highly available with no single point of failure.

When using ASM native redundancy, high redundancy diskgroups are recommended to provide maximum protection for unplanned outages and during storage software updates. By default Exadata deployments use high redundancy for all diskgroups (both for data and recovery destinations).

Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is a multi-platform, scalable file system and storage management technology that extends Oracle Automatic Storage Management (Oracle ASM) functionality to support all customer files and can be leveraged for non-database files.

These best practices are built-in to all Oracle Exadata Systems.

See Introducing Oracle Automatic Storage Management

Certified and validated network architecture

Ensure that the entire database and storage network topology has multiple network paths with no single point of failure.

When connecting to the database service, use built-in Virtual Internet Protocol (VIP) addresses, Single Client Access Name (SCAN), and multiple local SCAN listeners configured over a bonded client network.

Use a separate high bandwidth, bonded network for backup or Data Guard traffic.

For the private network used as the cluster interconnect, Oracle recommends that non-Exadata customers use Oracle HAIP for network redundancy instead of using bonded networks. Bonding configurations have various attributes that behave differently with different network cards and switch settings. This recommendation does not apply to the private cluster interconnect in Exadata environments, because the bond setup has been properly configured and validated. Further, Exadata uses the CLUSTER_INTERCONNECT parameter over the highly available bonded network. Generic systems should NOT use the CLUSTER_INTERCONNECT and bonding but rather use Oracle HAIP.

Cluster configuration checks

Use Cluster Verification Utility (CVU) at monthly intervals to validate a range of cluster and Oracle RAC components such as shared storage devices, networking configurations, system requirements, and Oracle Clusterware. See Cluster Verification Utility Reference

To perform a holistic, proactive health check and to evaluate if Oracle RAC or Exadata best practices are being followed, use exachk for Exadata RAC systems, or use orachk for non-Exadata RAC systems, at monthly intervals and before and after any software update.

See ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2)and Oracle Exadata Database Machine exachk or HealthCheck (Doc ID 1070954.1).

Note that both exachk and orachk include CVU checks. Exachk covers software and configuration best practices and critical alerts for Storage, Network, Clusterware, ASM, and Database.

Incorporate configuration recommendations from CVU, exachk, or orachk.

Reduce downtime for database node or instance failures

Typically, the default settings are sufficient for most use cases. If node detection and instance recovery need to be expedited, evaluate lower values for FAST_START_MTTR_TARGET

Reducing FAST_START_MTTR_TARGET can increase database writer activity significantly, so additional I/O bandwidth is required.

For Exadata systems, Instant Failure Detection capabilities use remote direct memory access (RDMA) to quickly confirm server failures in less than 2 seconds compared to typical 30 seconds detection found in most Oracle RAC clusters.

Eliminate downtime for software updates

Use Oracle RAC rolling updates for Clusterware or database software updates (for example, Release Updates) to avoid downtime.

Use out-of-place software updates when possible, so rollback and fallback use cases are simplified.

Use software gold images to eliminate the complexity of running database opatch utility.

For a fleet of databases on a single Oracle RAC cluster or multiple clusters, use Oracle Fleet Patching and Provisioning

Make application and processes highly available on the cluster

When an application, process, or server fails in a cluster, you want the disruption to be as short as possible and transparent to users. For example, when an application fails on a server, that application can be restarted on another server in the cluster, minimizing or negating any disruption in the use of that application. Similarly, if a server in a cluster fails, then all of the applications and processes running on that server must failover to another server to continue providing service to the users. Using the built-in generic_application resource type, Oracle Clusterware can manage all of these entities to ensure high availability, resource types or customizable scripts and application agent programs, and resource attributes that you assign to applications and processes.

Use Oracle Clusterware to manage third-party resources and agents that reside on the cluster.

See Making Applications Highly Available Using Oracle Clusterware

Reduce application downtime for planned and unplanned outages

Leverage Clusterware-managed services and application best practices to achieve zero application downtime.

Use SRVCTL to manage services for your PDB. Never use default service for application connectivity. Always have at least one preferred Oracle RAC instance and at least one additional available Oracle RAC instance for High Availability.

Applications should subscribe to HA Fast Application Notifications (FAN) and be configured to respond and failover if required.

See Enabling Continuous Service for Applications and Continuous Availability - Application Checklist for Continuous Service for MAA Solutions

Capacity planning

Capacity planning and sizing should be done before deployment, and periodically afterward, to ensure that there are sufficient system resources to meet application performance requirements.

Capacity planning needs to accommodate growth or consolidation of databases, additional application workloads, additional processes, or anything that strains existing system resources.

Evaluating if performance requirements are still met during an unplanned outage or planned maintenance events is also crucial.