13.1 Overview of Oracle Streams High Availability Environments
Configuring a high availability solution requires careful planning and analysis of failure scenarios. Database backups and physical standby databases provide physical copies of a source database for failover protection. Oracle Data Guard, in SQL apply mode, implements a logical standby database in a high availability environment. Because Oracle Data Guard is designed for a high availability environment, it handles most failure scenarios. However, some environments might require the flexibility available in Oracle Streams, so that they can take advantage of the extended feature set offered by Oracle Streams.
This chapter discusses some of the scenarios that can benefit from an Oracle Streams-based solution and explains Oracle Streams-specific issues that arise in high availability environments.
Oracle Data Guard Concepts and Administration for more information about Oracle Data Guard
13.2 Protection from Failures
Oracle Real Application Clusters (Oracle RAC) is the preferred method for protecting from an instance or system failure. After a failure, services are provided by a surviving node in the cluster. However, clustering does not protect from user error, media failure, or disasters. These types of failures require redundant copies of the database. You can make both physical and logical copies of a database.
Physical copies are identical, block for block, with the source database, and are the preferred means of protecting data. There are three types of physical copies: database backup, mirrored or multiplexed database files, and a physical standby database.
Logical copies contain the same information as the source database, but the information can be stored differently within the database. Creating a logical copy of your database offers many advantages. However, you should always create a logical copy in addition to a physical copy, not instead of physical copy.
A logical copy has the following benefits:
A logical copy can be open while being updated. This ability makes the logical copy useful for near real-time reporting.
A logical copy can have a different physical layout that is optimized for its own purpose. For example, it can contain additional indexes, and thereby improve the performance of reporting applications that use the logical copy.
A logical copy provides better protection from corruptions. Because data is logically captured and applied, it is very unlikely that a physical corruption can propagate to the logical copy of the database.
There are three types of logical copies of a database:
Logical standby databases
Oracle Streams replica databases
Logical standby databases are best maintained using Oracle Data Guard in SQL apply mode. The rest of this chapter discusses Oracle Streams replica databases and application maintained copies.
Oracle Database Backup and Recovery User's Guide for more information about database backups and mirroring or multiplexing database files
Oracle Data Guard Concepts and Administration for more information about physical standby databases and logical standby databases
13.2.1 Oracle Streams Replica Database
Like Oracle Data Guard in SQL apply mode, Oracle Streams can capture database changes, propagate them to destinations, and apply the changes at these destinations. Oracle Streams is optimized for replicating data. Oracle Streams can capture changes at a source database, and the captured changes can be propagated asynchronously to replica databases. This optimization can reduce the latency and can enable the replicas to lag the primary database by no more than a few seconds.
Nevertheless, you might choose to use Oracle Streams to configure and maintain a logical copy of your production database. Although using Oracle Streams might require additional work, it offers increased flexibility that might be required to meet specific business requirements. A logical copy configured and maintained using Oracle Streams is called a replica, not a logical standby, because it provides many capabilities that are beyond the scope of the normal definition of a standby database. Some of the requirements that can best be met using an Oracle Streams replica are listed in the following sections.
Oracle Streams Replication Administrator's Guide for more information about replicating database changes with Oracle Streams
18.104.22.168 Updates at the Replica Database
The greatest difference between a replica database and a standby database is that a replica database can be updated and a standby database cannot. Applications that must update data can run against the replica, including jobs and reporting applications that log reporting activity. Replica databases also allow local applications to operate autonomously, protecting local applications from WAN failures and reducing latency for database operations.
22.214.171.124 Heterogeneous Platform Support
The production and the replica do not need to be running on the exact same platform. This provides more flexibility in using computing assets, and facilitates migration between platforms.
126.96.36.199 Multiple Character Sets
Oracle Streams replicas can use different character sets than the production database. Data is automatically converted from one character set to another before being applied. This ability is extremely important if you have global operations and you must distribute data in multiple countries.
188.8.131.52 Mining the Online Redo Logs to Minimize Latency
If the replica is used for near real-time reporting, Oracle Streams can lag the production database by no more than a few seconds, providing up-to-date and accurate queries. Changes can be read from the online redo logs as the logs are written, rather than from the redo logs after archiving.
184.108.40.206 Fast Failover
Oracle Streams replicas can be open to read/write operations at all times. If a primary database fails, then Oracle Streams replicas are able to instantly resume processing. A small window of data might be left at the primary database, but this data will be automatically applied when the primary database recovers. This ability can be important if you value fast recovery time over no lost data. Assuming the primary database can eventually be recovered, the data is only temporarily unavailable.
220.127.116.11 Single Capture for Multiple Destinations
In a complex environment, changes need only be captured once. These changes can then be sent to multiple destinations. When a capture process is used to capture changes, this ability enables more efficient use of the resources needed to mine the redo logs for changes.
13.2.2 When Not to Use Oracle Streams
As mentioned previously, there are scenarios in which you might choose to use Oracle Streams to meet some of your high availability requirements. One of the rules of high availability is to keep it simple. Oracle Data Guard is designed for high availability and is easier to implement than an Oracle Streams-based high availability solution. If you decide to leverage the flexibility offered by Oracle Streams, then you must be prepared to invest in the expertise and planning required to make an Oracle Streams-based solution robust. You might need to write scripts to implement much of the automation and management tools provided with Oracle Data Guard.
13.2.3 Application-Maintained Copies
The best availability can be achieved by designing the maintenance of logical copies of data directly into an application. The application knows what data is valuable and must be immediately moved off-site to guarantee no data loss. It can also synchronously replicate truly critical data, while asynchronously replicating less critical data. Applications maintain copies of data by either synchronously or asynchronously sending data to other applications that manage another logical copy of the data. Synchronous operations are performed using the distributed SQL or remote procedure features of the database. Asynchronous operations are performed using Advanced Queuing. Advanced Queuing is a database message queuing feature that is part of Oracle Streams.
Although the highest levels of availability can be achieved with application-maintained copies of data, great care is required to realize these results. Typically, a great amount of custom development is required. Many of the difficult boundary conditions that have been analyzed and solved with solutions such as Oracle Data Guard and Oracle Streams replication must be reanalyzed and solved by the custom application developers. In addition, standard solutions like Oracle Data Guard and Oracle Streams replication undergo stringent testing both by Oracle and its customers. It will take a great deal of effort before a custom-developed solution can exhibit the same degree of maturity. For these reasons, only organizations with substantial patience and expertise should attempt to build a high availability solution with application maintained copies.
Oracle Database Advanced Queuing User's Guide for more information about developing applications with Advanced Queuing
13.3 Best Practices for Oracle Streams High Availability Environments
Implementing Oracle Streams in a high availability environment requires consideration of possible failure and recovery scenarios, and the implementation of procedures to ensure Oracle Streams continues to capture, propagate, and apply changes after a failure. Some of the issues that must be examined include the following:
13.3.1 Configuring Oracle Streams for High Availability
When configuring a solution using Oracle Streams, it is important to anticipate failures and design availability into the architecture. You must examine every database in the distributed system, and design a recovery plan in case of failure of that database. In some situations, failure of a database affects only services accessing data on that database. In other situations, a failure is multiplied, because it can affect other databases.
This section contains these topics:
18.104.22.168 Directly Connecting Every Database to Every Other Database
A configuration where each database is directly connected to every other database in the distributed system is the most resilient to failures, because a failure of one database will not prevent any other databases from operating or communicating. Assuming all data is replicated, services that were using the failed database can connect to surviving replicas.
22.214.171.124 Creating Hub-and-Spoke Configurations
Although configurations where each database is directly connected to every other database provide the best high availability characteristics, they can become difficult to manage when the number of databases becomes large. Hub-and-spoke configurations solve this manageability issue by funneling changes from many databases into a hub database, and then to other hub databases, or to other spoke databases. To add a new source or destination, you simply connect it to a hub database, rather than establishing connections to every other database.
A hub, however, becomes a very important node in your distributed environment. Should it fail, all communications flowing through the hub will fail. Due to the asynchronous nature of the messages propagating through the hub, it can be very difficult to redirect a stream from one hub to another. A better approach is to make the hub resilient to failures.
The same techniques used to make a single database resilient to failures also apply to distributed hub databases. Oracle recommends Oracle Real Application Clusters (Oracle RAC) to provide protection from instance and node failures. This configuration should be combined with a "no loss" physical standby database, to protect from disasters and data errors. Oracle does not recommend using an Oracle Streams replica as the only means to protect from disasters or data errors.
Oracle Streams Replication Administrator's Guide for a detailed example of a hub-and-spoke replication environment
126.96.36.199 Local or Downstream Capture with Oracle Streams Capture Processes
Oracle Streams capture processes support capturing changes from the redo log on the local source database or at a downstream database at a different site. The choice of local capture or downstream capture has implications for availability. When a failure occurs at a source database, some changes might not have been captured. With local capture, those changes might not be available until the source database is recovered. In the event of a catastrophic failure, those changes might be lost.
Downstream capture at a remote database reduces the window of potential data loss in the event of a failure. Depending on the configuration, downstream capture enables you to guarantee all changes committed at the source database are safely copied to a remote site, where they can be captured and propagated to other databases and applications. Oracle Streams uses the same mechanism as Oracle Data Guard to copy redo data or log files to remote destinations, and supports the same operational modes, including maximum protection, maximum availability, and maximum performance.
Synchronous capture is always configured at the source database.
13.3.2 Recovering from Failures
The following sections provide best practices for recovering from failures.
This section contains these topics:
188.8.131.52 Automatic Capture Process Restart After a Failover
After a failure and restart of a single-node database, or a failure and restart of a database on another node in a cold failover cluster, the capture process automatically returns to the status it was in at the time of the failure. That is, if it was running at the time of the failure, then the capture process restarts automatically.
"Queues and Oracle Real Application Clusters" for information about primary and secondary instance ownership for queues
184.108.40.206 Database Links Reestablishment After a Failover
It is important to ensure that a propagation continues to function after a failure of a destination database instance. A propagation job will retry (with increasing delay between retries) its database link sixteen times after a failure until the connection is reestablished. If the connection is not reestablished after sixteen tries, then the propagation schedule is aborted.
If the database is restarted on the same node, or on a different node in a cold failover cluster, then the connection should be reestablished. In some circumstances, the database link could be waiting on a read or write, and will not detect the failure until a lengthy time out expires. The time out is controlled by the
TCP_KEEPALIVE_INTERVAL TCP/IP parameter. In such circumstances, you should drop and re-create the database link to ensure that communication is reestablished quickly.
In a high availability environment, you can prepare scripts that will drop and re-create all necessary database links. After a failover, you can execute these scripts so that Oracle Streams can resume propagation.
220.127.116.11 Propagation Job Restart After a Failover
For messages to be propagated from a source queue to a destination queue, a propagation job must run on the instance owning the source queue. In a single-node database, or cold failover cluster, propagation resumes when the single database instance is restarted.
18.104.22.168 Automatic Apply Process Restart After a Failover
After a failure and restart of a single-node database, or a failure and restart of a database on another node in a cold failover cluster, the apply process automatically returns to the status it was in at the time of the failure. That is, if it was running at the time of the failure, then the apply process restarts automatically.
"Queues and Oracle Real Application Clusters" for information about primary and secondary instance ownership for queues