3 Designing an Oracle Fail Safe Solution

Oracle Fail Safe provides a number of configuration options to satisfy your architecture or failover requirements.

This chapter discusses the following topics:

3.1 Customizing Your Configuration

You can deploy highly available solutions using the following configurations:

These configurations differ in the way work is allocated among the cluster nodes, but share the following features:

  • One or more Oracle homes are created on a private disk (usually the system disk) on each node.

  • All Oracle product executable files are installed in the Oracle homes on each node.

  • All data files, configuration files, log files, html files, and so on that are required by the application being made highly available are placed on cluster disks, so that they can be accessed by each cluster node.

The Oracle Fail Safe software automatically runs as needed on one or more cluster nodes to ensure proper configuration and failover.

Figure 1-4 shows the software and hardware components in a cluster configured with Oracle Fail Safe.

3.1.1 Active/Passive Configuration

In an active/passive configuration, one or more nodes host the entire cluster workload, but one node remains idle (as a standby server), ready to take over processing in case a node running an application fails. This solution ensures that the performance for the fail-safe workload is the same before and after failover.

Figure 3-1 shows a two-node configuration with Oracle Database running on Node 1, and with Node 2 as a standby server. Currently, nothing is running on Node 2. Node 2 takes over the workload of Node 1 in the event of a failover.

Figure 3-1 Active/Passive (Standby) Two-Node Configuration

Description of Figure 3-1 follows
Description of "Figure 3-1 Active/Passive (Standby) Two-Node Configuration "

Figure 3-2 shows a four-node configuration with Oracle Database running on Node 1, Node 2, and Node 3. Node 4 is the standby node. Currently, nothing is running on Node 4. In the event of a failover, Node 4 takes over the failover workload.

Figure 3-2 Active/Passive (Standby) Four-Node Configuration

Description of Figure 3-2 follows
Description of "Figure 3-2 Active/Passive (Standby) Four-Node Configuration"

An active/passive configuration is the fastest failover configuration, because the passive standby node has no workload of its own.

3.1.2 Active/Active Configuration

In an active/active configuration each node shares the application processing tasks, and also backs up other nodes in the event of a failure. If one node fails, then another node runs its own applications and services as well as those that fail over from the failed node. The active/active configuration is more cost-effective than the active/passive configuration. This configuration provides a flexible architecture that enables division of the workload to best meet your business needs.

Figure 3-3 shows a two-node active/active configuration with an Oracle Database running on both cluster nodes. In addition, a generic service is running on Node 1. In Figure 3-3, an Oracle Database is used for marketing on Node 1, and for sales on Node 2. The cluster disks owned by Node 1 store the marketing files, and the cluster disks owned by Node 2 store the sales files.

Figure 3-3 Active/Active Configuration

Description of Figure 3-3 follows
Description of "Figure 3-3 Active/Active Configuration"

In the active/active configuration, all nodes actively process applications during normal operations. This configuration provides better performance (higher throughout) when all nodes are operating, but slower failover and possibly reduced performance when a node fails. Also, the client connections are distributed over all nodes.

Balancing workload means making trade-offs concerning the size of the normal workload on each system. If all systems run at nearly full capacity, then few resources are available to handle the load of another system in an outage, and client systems experience significantly slower response during and after a failover. If you have the resources to quickly repair or replace a failed system, then the temporary period during which one cluster node serves both workloads will be small; a short period of slow response can be tolerated better than a long one. In fact, some businesses actually prefer having applications run more slowly than usual than to have a period of downtime.

Alternatively, running all systems slightly under 75% to 50% capacity (depending on the number of nodes in the cluster) ensures that clients do not experience loss of response time after a failover, but the equivalent of an entire system can remain idle under normal conditions, much like an active/passive configuration.

Oracle Fail Safe can be configured to avoid some of the performance problems with this type of configuration. For example, you can:

  • Enable failover only for your mission-critical applications

  • Use different database parameter files on each node so that fewer system resources are used after a failover

  • Configure each component (Oracle Database and so on) into a separate group with its own failover and failback policies

    This is possible because Oracle Fail Safe enables you to configure each cluster node to host several virtual servers.

  • Combine the scripting support of Oracle Fail Safe (using the PowerShell cmdlets described in Chapter 6) with a system monitoring tool (such as Oracle Enterprise Manager) to automate the movement of groups for load-balancing purposes.

Although the nodes do not need to be physically identical, you must select servers with enough power, memory, disk host adapters, and disk drives to support an adequate level of service if a failover occurs at a busy time of the day.

3.2 Integrating Clients and Applications

To operate in an Oracle Fail Safe environment, client applications do not require any special programming or changes. Client applications that work with an Oracle resource on a single node continues to function correctly in an Oracle Fail Safe environment without recoding, recompiling, or relinking. This is because clients can use the virtual server to access the application.

Chapter 8 contains a section specific to how you can integrate clients and applications. Chapter 8 describes how to make your clients and applications transparently fail over when a database fails over to another node in the cluster.