|Oracle® Database High Availability Best Practices
11g Release 1 (11.1)
|PDF · Mobi · ePub|
The proper configuration of Oracle Data Guard Redo Apply and SQL Apply is essential to ensuring that all standby databases work properly and perform their roles within the necessary service levels after switchovers and failovers.
The best practices for Oracle Data Guard build on the ones described in Section 2.2, "Configuring Oracle Database 11g". You can configure most Oracle Data Guard settings using Oracle Enterprise Manager and the broker. To set more advanced, less frequently used configuration parameters, use the DGMGRL command-line interface or SQL*Plus statements.
Oracle Data Guard enables you to use a physical standby database (Redo Apply), a snapshot standby database, the Active Data Guard option (real-time query), or a logical standby database (SQL Apply), real-time apply, or a combination of these.
A physical standby database provides a physically identical copy of the primary database, with on disk database structures that are identical to the primary database on a block-for-block basis. The database schemas, including indexes, are the same. A physical standby database is kept synchronized with the primary database by applying the redo data received from the primary database through media recovery. In addition, a physical standby database:
Can be opened for read-only access (real-time queries) while Redo Apply is active if you purchase a license for the Oracle Active Data Guard option.
Can be converted to a snapshot standby database for use as a testing database or cloning, and then later converted back to run in the physical standby database role.
Can be temporarily converted into a transient logical standby database on which you can perform a rolling upgrade, incurring minimal downtime.
A logical standby database contains the same logical information as the primary database, although the physical organization and structure of the data can be different from the primary database. SQL Apply keeps the logical standby database synchronized with the primary database by transforming the redo data received from the primary database into SQL statements and then executing the SQL statements on the standby database. You can use a logical standby database for other business purposes in addition to disaster-recovery requirements.
This section differentiates physical and logical standby databases to help you determine which is the best solution for your business requirements.
Simplicity and reliability of a physical replica is preferable.
The primary database has a very high redo generation rate.
The highest level of protection against corruption is required.
An up-to-date standby database that is open read-only can address planned uses of the standby database while it is running in the standby role. (Requires a license for the Oracle Active Data Guard option.)
You want to offload fast incremental backups to the standby database. (Requires a license for the Oracle Active Data Guard option.)
You want to use a snapshot standby database for quality assurance testing or other uses that require a database that is open read-write.
To perform rolling database upgrades using a transient logical standby database.
You want to run reporting applications that require read/write access to the standby database.
Note: You cannot modify the data maintained by a logical standby database.
You want to add tables, additional schemas, indexes, and materialized views to your standby database that do not exist on your primary database.
You will perform a rolling database upgrade from a database currently running an Oracle Database 10g release.
Note: If your database is already running Oracle Database 11g, use a physical standby database and the transient logical standby database rolling upgrade process.
In addition, use a logical standby database if you have any of the preceding requirements and any of the following characteristics:
You require basic one-way replication of the entire database.
Note: If you have more complex replication requirements (for example, multimaster, many-to-one, transformations, and so on) then use Oracle Streams (see Section 2.8).
You do not have any unsupported data type, or for which EDSFoot 8 is not a viable workaround.
You meet other prerequisites of a logical standby database.
Performance tests confirm that logical standby databases can handle peak workloads.
In some situations, a business cannot afford to lose data at any cost. In other situations, the availability of the database might be more important than protecting data. Some applications require maximum database performance and can tolerate a potential loss of data if a disaster occurs.
Based on your business requirements, choose one of the following protection modes:
Maximum protection mode guarantees that no data loss occurs if the primary database fails. To ensure that data loss cannot occur, the primary database shuts down if a fault prevents it from writing the redo stream to at least one standby database.
Maximum performance mode (the default mode) provides the highest level of data protection that is possible without affecting the performance of the primary database. This is accomplished by allowing a transaction to commit as soon as the redo data needed to recover that transaction is written to the local online redo log.
The redo data stream of the primary database is also written to at least one standby database, but that redo stream is written asynchronously with the commitment of the transactions that create the redo data. When network links with sufficient bandwidth are used, this mode provides a level of data protection that approaches that of maximum availability mode, with minimal effect on primary database performance.
To determine the appropriate data protection mode for your application, perform the following steps:
Compare your business requirements to the data loss scenarios in Table 2-2.
|If data loss is not acceptable during ...||Then ...|
A primary site failure, failure of one or all standby sites, or any network failure
Use maximum protection mode.
A primary site failure
Use maximum availability mode. Otherwise, use maximum performance mode.
Consider the effect of latency on application throughput:
When using maximum protection mode or maximum availability mode, consider how the latency might affect application throughput. The distance between sites and the network infrastructure between the sites determine the network latency, and therefore determine the protection mode that can be used. In general, latency increases and bandwidth decreases with distance:
For a low-latency, high bandwidth network, use maximum protection or maximum availability protection mode.
In this case, the effect on performance is minimal and you can achieve zero data loss.
For a high-latency network, use maximum performance mode with the
In this case, the performance effect on the primary database is minimal, and you can usually limit data loss to seconds. You can still use the maximum availability mode and maximum protection mode with the
SYNC transport, but you must assess if the additional
COMMIT latency might exceed your application performance requirements. In some cases, the response time or throughput overhead is zero or within acceptable requirements. Large batch applications or a message queuing applications are good examples where maximum availability with
SYNC is still applicable across a high-latency network.
Bandwidth must be greater than the maximum redo generation rate. A guideline for two-way communication is for bandwidth to be 50% of the stated network capacity, but also consider the network usage of other applications. Using the maximum performance mode with the
ASYNC redo transport mitigates the effect on performance.
You should deploy multiple standby databases for any of the following purposes:
To provide continuous protection following failover
The standby databases in a multiple standby configuration that are not the target of the role transition (these databases are referred to as bystander standby databases) automatically apply redo data received from the new primary database.
To achieve zero data loss protection while also guarding against widespread geographic disasters that extend beyond the limits of synchronous communication
For example, one standby database that receives redo data synchronously is located 200 miles away, and a second standby database that receives redo data asynchronously is located 1,500 miles away from the primary.
To perform rolling database upgrades while maintaining disaster protection throughout the rolling upgrade process
To perform testing and other ad-hoc tasks while maintaining disaster-recovery protection
When desired, use some standby databases for such purposes while reserving at least one standby database to serve as the primary failover target.
Oracle Database High Availability Overview Section 220.127.116.11 "Overview of Multiple Standby Database Architectures" for multiple standby database implementations and examples
The MAA white paper "Multiple Standby Databases Best Practices" at
Use the following configuration best practices for Data Guard:
See Also:Appendix A, "Database SPFILE and Oracle Net Configuration File Samples" for detailed examples of initialization parameter settings, including SPFILE samples and Oracle Net configuration files
Oracle recommends that you use the Recovery Manager (RMAN) utility to simplify the process of creating a physical standby database.
You can either create a standby database from backups of your primary database, or create a standby database over the network:
Use the RMAN
DUPLICATE TARGET DATABASE FOR STANDBY command to create a standby database from backups of your primary database.
You can use any backup copy of the primary database to create the physical standby database if the necessary archived redo log files to completely recover the database are accessible by the server session on the standby host. RMAN restores the most recent datafiles unless you execute the
SET UNTIL command.
Use the RMAN
FROM ACTIVE DATABASE option to create the standby database over the network if a preexisting database backup is not accessible to the standby system.
RMAN copies the data files directly from the primary database to the standby database. The primary database must be mounted or open.
You must choose between active and backup-based duplication. If you do not specify the
FROM ACTIVE DATABASE option, then RMAN performs backup-based duplication. Creating a standby database over the network is advantageous because:
You can transfer redo data directly to the remote host over the network without first having to go through the steps of performing a backup on the primary database. (Restoration requires multiple steps including storing the backup locally on the primary database, transferring the backup over the network, storing the backup locally on the standby database, and then restoring the backup on the standby database.)
With active duplication you can backup a database (as it is running) from ASM, and restore the backup to a host over the network and place the files directly into ASM.
Before this feature, restoration required you to backup the primary and copy the backup files on the primary host file system, transfer the backup files over the network, place the backup files on the standby host file system, and then restore the files into ASM.
Enable Flashback Database on both the primary and standby database so that, in case the original primary database has not been damaged, you can reinstate the original primary database as a new standby database following a failover. If there is a failure during the switchover process, then it can easily be reversed when Flashback Database is enabled.
See Also:Section 18.104.22.168, "Enable Flashback Database" for more information about Flashback Database and for information about enabling Flashback Database
When the primary database is in
FORCE LOGGING mode, all database data changes are logged.
FORCE LOGGING mode ensures that the standby database remains consistent with the primary database. If this is not possible because you require the load performance with
NOLOGGING operations, then you must ensure that the corresponding physical standby data files are subsequently synchronized. To synchronize the physical standby data files, either apply an incremental backup created from the primary database or replace the affected standby data files with a backup of the primary data files taken after the nologging operation. Before the file transfer, you must stop Redo Apply on the physical standby database.
For logical standby databases, when SQL Apply encounters a redo record for an operation performed with the
NOLOGGING clause, it skips over the record and continues applying changes from later records. Later, if an attempt is made to access a record that was updated with
NOLOGGING in effect, the following error is returned:
ORA-01403 no data found. To recover after the
NOLOGGING clause is specified for a logical standby database, re-create one or more tables from the primary database, as described in Oracle Data Guard Concepts and Administration in the section about "Adding or Re-Creating Tables On a Logical Standby Database."
You can enable force logging immediately by issuing an
ALTER DATABASE FORCE LOGGING statement. If you specify
FORCE LOGGING, then Oracle waits for all ongoing unlogged operations to finish.
Use the broker to create, manage, and monitor a Data Guard configuration. The benefits of using broker include:
Integration with Oracle RAC
The broker is integrated with Cluster Ready Services (CRS)Foot 9 so that database role changes occur smoothly and seamlessly. This is especially apparent for a planned role switchover (for example, when a physical standby database is directed to take over the primary role while the original primary database assumes the role of standby). The broker and CRS work together to temporarily suspend service availability on the primary database, accomplish the actual role change for both databases during which CRS works with the broker to properly restart the instances as necessary, and then resume service availability on the new primary database. The broker manages the underlying Data Guard configuration and its database roles while CRS manages service availability that depends upon those roles. Applications that rely on CRS for managing service availability see only a temporary suspension of service as the role change occurs in the Data Guard configuration.
Automated creation of a Data Guard configuration
Oracle Enterprise Manager provides a wizard that automates the complex tasks involved in creating a broker configuration, including:
Adding an existing standby database, or a new standby database created from existing backups taken through Enterprise Manager
Configuring the standby control file, server parameter file, and data files
Initializing communication with the standby databases
Creating standby redo log files
Enabling Flashback Database if you plan to use fast-start failover
Although the Data Guard command-line interface (DGMGRL) cannot automatically create a standby database, you can use DGMGRL commands to configure and monitor an existing standby database, including those created using Enterprise Manager.
Simplified switchover and failover operations
The broker simplifies switchovers and failovers by allowing you to invoke them using a single key click in Oracle Enterprise Manager or a single command at the DGMGRL command-line interface (using DGMGRL is referred to in this documentation as manual failover). For lights-out administration, you can enable fast-start failover to allow the broker to determine if a failover is necessary and initiate the failover to a pre-specified target standby database automatically, with no need for DBA intervention and with little or no loss of data.
Fast-start failover enables you to increase availability with less need for manual intervention, thereby reducing management costs. Manual failover gives you control over exactly when a failover occurs and to which target standby database. Regardless of the method you choose, the broker coordinates the role transition on all databases in the configuration.
The broker automatically publishes FAN/AQ (Advanced Queuing) notifications after a failover. Properly configured clients that are also configured for Fast Connection Failover can use these notifications to connect to the new primary database to resume application processing.
Built-in monitoring and alert and control mechanisms
The broker provides built-in validation that monitors the health of all databases in the configuration. From any system in the configuration connected to any database, you can capture diagnostic information and detect obvious and subtle problems quickly with centralized monitoring, testing, and performance tools. Both Enterprise Manager and DGMGRL retrieve a complete configuration view of the progress of redo transport services on the primary database and the progress of Redo Apply or SQL Apply on the standby database.
The ability to monitor local and remote databases and respond to events is significantly enhanced by the broker health check mechanism and by its tight integration with the Enterprise Manager event management system.
Each database uses a flash recovery area.
The primary database instances archive remotely to only one apply instance.
Table 2-3 describes the recommendations for a robust archiving strategy when managing a Data Guard configuration through SQL*Plus. All of the following items are handled automatically when the broker is managing a configuration.
Start archiving on the primary and standby databases
Maintaining a standby database requires that you enable and start archiving on the primary database, as follows:
SQL> SHUTDOWN IMMEDIATE SQL> STARTUP MOUNT; SQL> ALTER DATABASE ARCHIVELOG; SQL> ALTER DATABASE OPEN;
Archiving must also be enabled on the standby database to support role transitions. To enable archiving on the standby database:
SQL> SHUTDOWN IMMEDIATE; SQL> STARTUP MOUNT; SQL> ALTER DATABASE ARCHIVELOG;
If the standby database is a logical standby, then open the database:
SQL> ALTER DATABASE OPEN;
Use a consistent log format (
Note: If the flash recovery area is used, then this format is ignored.
Perform remote archiving to only one standby instance and node for each Oracle RAC standby database.
All primary database instances archive to one standby destination, using the same net service name. Oracle Net Services connect-time failover is used to automatically switch to the "secondary" standby host when the "primary" standby instance has an outage.
If the archives are accessible from all nodes because ASM or some other shared file system is being used for the flash recovery area, then remote archiving can be spread across the different nodes of an Oracle RAC standby database.
Specify role-based destinations with the
The following example illustrates the recommended initialization parameters for a primary database communicating to a physical standby database. There are two instances,
SALES2, running in maximum protection mode.
*.DB_RECOVERY_FILE_DEST=+RECO *.LOG_ARCHIVE_DEST_1='SERVICE=SALES_stby SYNC AFFIRM NET_TIMEOUT=30 REOPEN=300 VALID_FOR=(ONLINE_LOGFILES, ALL_ROLES) DB_UNIQUE_NAME=SALES_stby' *.LOG_ARCHIVE_DEST_STATE_1=ENABLE
The flash recovery area must be accessible to any node within the cluster and use a shared file system technology such as automatic storage management (ASM), a cluster file system, a global file system, or high availability network file system (HA NFS). You can also mount the file system manually to any node within the cluster very quickly. This is necessary for recovery because all archived redo log files must be accessible on all nodes.
On the standby database nodes, recovery from a different node is required when a failure occurs on the node applying redo and the apply service cannot be restarted. In that case, any of the existing standby instances residing on a different node can initiate managed recovery. In the worst case, when the standby archived redo log files are inaccessible, the managed recovery process (MRP) or logical standby process (LSP) on the different node fetches the archived redo log files using the FAL server to retrieve from the primary node directly.
When configuring hardware vendor shared file system technology, verify the performance and availability implications. Investigate the following issues before adopting this strategy:
Is the shared file system accessible by any node regardless of the number of node failures?
What is the performance impact when implementing a shared file system?
Is there any effect on the interconnect traffic?
(maximum # of logfile groups +1) * maximum # of threads
For example, if a primary database has two instances (threads) and each thread has two online log groups, then there should be six standby redo logs ((2 + 1) * 2 = 6). Having one more standby log group for each thread (that is, one greater than the number of the online redo log groups for the primary database) reduces the likelihood that the logwriter process for the primary instance is blocked because a standby redo log cannot be allocated on the standby database.
The following statements create two standby log members for each group, and each member is 10 MB. One member is created in the directory specified by the
DB_CREATE_FILE_DEST initialization parameter, and the other member is created in the directory specified by
DB_RECOVERY_FILE_DEST initialization parameter. Because this example assumes that there are two online redo log groups in two threads, the next group is group five.
SQL> ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 5 SIZE 10M, GROUP 6 SIZE 10M, GROUP 7 SIZE 10M; SQL> ALTER DATABASE ADD STANDBY LOGFILE THREAD 2 GROUP 8 SIZE 10M, GROUP 9 SIZE 10M, GROUP 10 SIZE 10M;
Consider the following additional guidelines when creating standby redo logs:
Create the same number of standby redo logs on both the primary and standby databases.
Create all online redo logs and standby redo logs for both primary and standby databases so that they are the same size.
Create standby redo logs in the Data Area protected through ASM or external redundancy.
In an Oracle RAC environment, create standby redo logs on a shared disk.
In an Oracle RAC environment, assign the standby redo log to a thread when the standby redo log is created. In the above example, three standby redo logs are assigned to each thread.
Do not multiplex the standby redo logs.
To check the number and group numbers of the redo logs, query the
SQL> SELECT * FROM V$LOG;
To check the results of the
ALTER DATABASE ADD STANDBY LOGFILE THREAD statements, query the
SQL> SELECT * FROM V$STANDBY_LOG;
You can also see the members created by querying the
SQL> SELECT * FROM V$LOGFILE;
See Also:The section about "Configuring an Oracle Database to Receive Redo Data" in Oracle Data Guard Concepts and Administration
This section discusses best practices for planning and implementing redo transport services for Data Guard.
Oracle recommends that you conduct a performance assessment with your proposed network configuration and current (or anticipated) peak redo rate. The network effect between the primary and standby databases, and the effect on the primary database throughput, must be understood. Because the network between the primary and standby databases is essential for the two databases to remain synchronized, the infrastructure must have the following characteristics:
Sufficient bandwidth to accommodate the maximum redo generation rate
If using the
SYNC transport, then minimal latency is necessary to reduce the performance impact on the primary database
Multiple network paths for network redundancy
In configurations that use a dedicated network connection the required bandwidth is determined by the maximum redo rate of the primary database and the efficiency of the network. Depending on the data protection mode, there are other recommended practices and performance considerations. Maximum protection mode and maximum availability mode require
SYNC transport. The maximum performance protection mode uses the
ASYNC transport option.
ASYNC transport mode, the
SYNC transport mode can affect the primary database performance due to the incurred network latency. Distance and network configuration directly influence latency, while high latency can slow the potential transaction throughput and quicken response time. The network configuration, number of repeaters, the overhead of protocol conversions, and the number of routers also affect the overall network latency and transaction response time.
The following sections describe the behavior of the
ASYNC transport modes on primary database throughput, and setting the
SYNC redo transport for a high degree of synchronization between the primary and standby databases.When sending redo data to a standby database using the
SYNC attribute, a transaction on the primary database does not return a commit completed message to the foreground process until the redo associated with that transaction has been written both locally and remotely.
The commit time when using
SYNC is directly affected by the network latency and bandwidth, and the I/O capacity on the standby database. The total commit time is comprised of the primary database's local write (log file parallel write) and the following factors that are captured through the LNS wait event on
SENDREQ: network time + standby write (this is the RFS write I/O obtained from the
V$SYSTEM_EVENT view on the standby database) + network acknowledgment.
Effect of the SYNC Transport on the Primary Database: How much the primary database is affected depends on the application profile. In general, batch updates with infrequent commits and message queuing applications may not observe any noticeable difference.
ASYNC redo transport for minimal impact on the primary database, but with a lower degree of synchronization.When sending redo data to a standby database using the
ASYNC attribute, transactions on the primary database continue to be processed without waiting for the network I/O for requests to complete.
There is little effect on the primary database throughput (redo bytes per second) as network latency increases. The log writer process writes to the local online redo log file, while the network server processes (one for each destination) read redo from the log buffer and asynchronously transmit the redo to remote destinations. If redo transport services transmit redo to multiple remote destinations, then the network server processes initiate the network I/O to all of the destinations in parallel.
Effect of the ASYNC Transport on the Primary Database: The effect on primary database throughput is minimal due to the true asynchronous behavior of
ASYNC redo transport.
Note:The network server processes read from the log buffer unless the redo data has been flushed from the log buffer, in which case the network server processes read the redo data from the online redo log. The network server processes only read from the online redo log when changes cannot be found in the log buffer.
Because the network connections used in multiple streams are initiated by the archiver process, take care when setting the
LOG_ARCHIVE_MAX_PROCESSES initialization parameter. The value of the
LOG_ARCHIVE_MAX_PROCESSES initialization parameter must be at least one greater than the total number of all remote destinations. Use the following equation when setting the
LOG_ARCHIVE_MAX_PROCESSES parameter for highly available environments:
LOG_ARCHIVE_MAX_PROCESSES = sum(remote_destinations) + count(threads)
You can adjust these parameter settings after evaluating and testing the initial settings in your production environment.
The following sections include best practices for network configuration and highest redo network redo rates.
See Also:The MAA white paper "Data Guard Redo Transport & Network Configuration" at
To achieve high network throughput, especially for a high-latency, high-bandwidth network, the minimum recommended setting for the sizes of the TCP send and receive socket buffers is the bandwidth delay product (BDP) of the network link between the primary and standby systems. Settings higher than the BDP may show incremental improvement. For example, in the MAA Linux test lab, simulated high-latency, high-bandwidth networks realize small, incremental increases in throughput when using TCP send and receive socket buffers settings up to three times the BDP.
BDP is product of the network bandwidth and latency. Socket buffer sizes should be set using the Oracle Net parameters
SEND_BUF_SIZE, so that the socket buffer size setting affects only Oracle TCP connections. The operating system may impose limits on the socket buffer size that must be adjusted so Oracle can use larger values. For example, on Linux, the parameters
net.core.wmem_max limit the socket buffer size and must be set larger than
For example, if bandwidth is 622 Mbits and latency is 30 ms, then you would calculate the minimum size for the
SEND_BUF_SIZE parameters as follows: 622,000,000 / 8 x 0.030 = 2,332,500 bytes.
In this example, you would set the initialization parameters as follows:
With Oracle Net Services, it is possible to control data transfer by adjusting the size of the Oracle Net setting for the session data unit (SDU). Oracle internal testing indicates that setting the SDU to its maximum value of 32767 can improve performance. You can set SDU on a per connection basis using the
SDU parameter in the local naming configuration file (
TNSNAMES.ORA) and the listener configuration file (
LISTENER.ORA), or you can set the SDU for all Oracle Net connections with the profile parameter
DEFAULT_SDU_SIZE in the
See Also:Oracle Database Net Services Reference for more information about the
Beginning in Oracle Database 11g, Oracle Data Guard provides the ability to compress redo data as it is transmitted over the network. This compression can be performed while resolving redo gaps with the archive
n process or while shipping redo data during run time using the
LGWR ASYNC transport. In certain environments, enabling compression of redo data can:
Reduce network utilization
Provide faster resolution of gaps in archived redo log files
Reduced redo transfer time
In general, compression is most beneficial when used over low bandwidth networks. As the network bandwidth increases, the benefit is reduced. Compressing redo in a Data Guard environment is beneficial if:
Sufficient CPU resources are available for the compression processing.
The database redo rate is being throttled by a low bandwidth network.
Before enabling compression, assess the available CPU resources and decide if enabling compression is feasible. For complete information about enabling compression for gap resolution and the
LGWR ASYNC transport mode, see support note 729551.1 at
This section discusses the best practices for Data Guard log apply services for both physical and logical standby databases.
To use Redo Apply with a physical standby database, or to use any media recovery operation effectively, tune your database recovery by following these best practices:
Maximize I/O rates on standby redo logs and archived redo logs.
Measure read I/O rates on the standby redo logs and archived redo log directories. Concurrent writing of shipped redo on a standby database might reduce the redo read rate due to I/O saturation. The overall recovery rate is always bounded by the rate at which redo can be read; so ensure that the redo read rate surpasses your required recovery rate.
Assess recovery rate.
To obtain the history of recovery rates, use the following query to get a history of recovery progress:
SELECT * FROM V$RECOVERY_PROGRESS;
RATE is greater than the maximum redo generation rate at the primary database or twice the average generation rate at the primary database, then no tuning is required; otherwise follow the tuning tips below. The redo generation rate for the primary database can be monitored from Grid Control or extracted from AWR reports under statistic
LOG is greater than ten seconds, then investigate tuning I/O and checkpoints.
Use defaults for
DB_BLOCK_CHECKING and set
TYPICAL (the default setting) to
FULL. Block checking is always recommended on the primary database and might be enabled on the standby database if the recovery rate meets expectations.
Note:To check for block corruption that was not preventable through the
DB_BLOCK_CHECKINGparameter, use one of the following methods:
The default setting for
TYPICAL. Block checksum should always be enabled for both primary and standby databases. It catches most block corruption while incurring negligible overhead.
DB_LOST_WRITE_PROTECT parameter to
FULL on the standby database to enable Oracle to detect writes that are lost in the I/O subsystem. The impact on media recovery is very small and generally less than 2 percent.
Having a large database cache size can improve media recovery performance by reducing the amount of physical data block reads. Because media recovery does not require
DB_RECYCLE_CACHE_SIZE or require a large
SHARED_POOL_SIZE, the memory can be reallocated to the
Before converting the standby database into a primary database, reset these parameters to the primary database settings.
Assess database wait events
With the Active Data Guard option and real-time query, you can use Statspack from the primary database to collect data from a standby database that is opened read-only and performing recovery. Any tuning or troubleshooting exercise should start with collecting Standby Statspack reports. See support note 454848.1 at
http://support.oracle.com/ for complete details about installing and using Standby Statspack.
If you do not have a license for the Active Data Guard option, you can determine the top system and session wait events by querying the standby database's
V$EVENT_HISTOGRAM and looking for the largest
TIME_WAITED value. You may have to capture multiple snapshots of the query results and manually extract the difference to accurately assess a certain time period.
If recovery is applying a lot of redo data efficiently, the system is I/O bound and the I/O wait should be reasonable for your system. The vast majority of wait events related to parallel recovery coordinators and slaves apply to the coordinator. Slaves are either applying changes (clocking on CPU) or waiting for changes to be passed from the coordinator. The database wait events are shown in Table 2-4 and Table 2-5.
The parallel recovery coordinator is waiting on I/O from the online redo log or the archived redo log.
This event indicates that all read buffers are being used by slaves, and usually indicates that the recovery slaves lag behind the coordinator.
The parallel recovery coordinator is waiting for a buffer to be released by a recovery slave. Again, this is a sign the recovery slaves are behind the coordinator.
The parallel recovery coordinator is waiting for a file resize to finish, as would occur with file auto extend.
The coordinator has sent a synchronous control messages to all slaves, and is waiting for all slaves to reply.
When dealing with recovery slave events, it is important to know how many slaves were started. Divide the wait time for any recovery slave event by the number of slaves. Table 2-5 describes the parallel recovery slave wait events.
The parallel recovery slave is waiting for a change to be shipped from the coordinator. This is in essence an idle event for the recovery slave. To determine the amount of CPU a recovery slave is using, divide the time spent in this event by the number of slaves started and subtract that value from the total elapsed time. This may be close, because there are some waits involved.
A parallel recovery slave (or serial recovery process) is waiting for a batch of synchronous data block reads to complete.
Recovery is waiting for checkpointing to complete, and Redo Apply is not applying any changes currently.
A parallel recovery slave is waiting for a batched data block I/O.
DBWR must write out modified blocks from the buffer cache to the data files. Always use native asynchronous I/O by setting
TRUE (default). In the rare case that asynchronous I/O is not available, use
DBWR_IO_SLAVES to improve the effective data block write rate with synchronous I/O.
Ensure that you have sufficient I/O bandwidth and that I/O response time is reasonable for your system either by doing some base I/O tests, comparing the I/O statistics with those for the primary database, or by looking at some historical I/O metrics. Be aware that I/O response time may vary when many applications share the same storage infrastructure such as with a Storage Area Network (SAN) or Network Attached Storage (NAS).
Use system commands such as UNIX
vmstat commands, or use system monitoring tools to assess the system resources. Alternatively, you can monitor using Oracle Enterprise Manager, AWR reports, or performance views such as
If there are I/O bottlenecks or excessive wait I/O operations, then investigate operational or application changes that increased the I/O volume. If the high waits are due to insufficient I/O bandwidth, then add more disks to the relevant ASM disk group. Verify that this is not a bus or controller bottleneck or any other I/O bottleneck. The read I/O rate from the standby redo log should be greater than the expected recovery rate.
Check for excessive swapping or memory paging.
Check to ensure the recovery coordinator or MRP is not CPU bound during recovery.
Increase redo log size for the primary and standby databases.
Increase the online redo log size for the primary database and the standby redo log size for the standby database to a minimum of 1 GB. Oracle Database does a full checkpoint and updates all the file headers (in an optimized manner) at each log file boundary during media recovery. To reduce the frequency of a full database checkpoint and the file header updates, increase the redo log size so that a log switch is occurring at a minimum of 20-minute intervals on a heavily loaded system. Otherwise, once every hour should be sufficient.
To ensure that the crash recovery time for the primary database is minimized even with very large redo log sizes, set the
FAST_START_MTTR_TARGET initialization parameter to a nonzero value to enable fast-start fault recovery. If the parameter is not set, then set it to 3600. This initialization parameter is relevant only for the primary database.
Try different degrees of recovery parallelism.
Parallel recovery is enabled by default for media and crash recovery with the default degree of parallelism set to the number of CPUs available. In most cases this is the optimal setting. However, in some circumstances, you may obtain faster recovery by using a degree of parallelism that is different (higher or lower) than the default. To override the default setting, explicitly specify parallel recovery using the following SQL*Plus statement:
RECOVER MANAGED STANDBY DATABASE PARALLEL number_of_CPUs;
See Also:The MAA white paper "Data Guard Redo Apply and Media Recovery" at
This section discusses recommendations for Data Guard SQL Apply and logical standby databases.
This section contains these topics:
Set the initial value of the
MAX_SERVERS parameter to be eight times the number of CPUs. For example:
SQL> EXECUTE DBMS_LOGSTDBY.APPLY_SET('MAX_SERVERS', <8 x CPUs>);
SQL Apply automatically distributes the server processes. There is always one process for the Reader, Builder, and Analyzer roles, but usually you need a varied number of Preparer and Applier processes. By setting the
MAX_SERVERS parameter to a default setting of eight times the number of CPUs, you can avoid performing a lot of initial tuning. The previous default of 9 was much too low for an application with large transactions or a lot of concurrent transactions. A higher default setting results in more PGA memory utilization.
PRESERVE_COMMIT_ORDER parameter controls the order in which transactions are applied to the standby database. When set to
TRUE (the default), unrelated transactions are applied in the same order they were committed on the primary database. Dependent transactions are always committed in the correct sequence, regardless of the setting.
Follow these guidelines for setting the PRESERVE_COMMIT_ORDER parameter:
FALSE if you are using the logical standby database only for disaster-recovery purposes or if the reporting application using the standby database can accommodate transactions being applied out of sequence.
Most third-party replication solutions tolerate this relaxed
COMMIT processing. For OLTP applications, setting the parameter to
FALSE can potentially double the SQL Apply rates. MAA testing has shown that OLTP workload performance improves by 40% or greater when
PRESERVE_COMMIT_ORDER is set to
TRUE for a reporting or decision-support system.
However, if the standby database has fallen behind the primary database, then you can temporarily set the
PRESERVE_COMMIT_ORDER parameter to
FALSE when you want the logical standby database to quickly catch up to the primary database. For example:
SQL> EXECUTE DBMS_LOGSTDBY.APPLY_SET('PRESERVE_COMMIT_ORDER', 'FALSE');
Reset the parameter to
TRUE after the gap has been resolved.
See Also:The MAA white paper "SQL Apply Best Practices: Oracle Data Guard 11g Release 1" for examples that show the effect of setting the
PRESERVE_COMMIT_ORDERparameter for different situations
With proper planning and execution, Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business. Whether using physical standby or logical standby databases, MAA testing has determined that switchover and failover times with Oracle Data Guard 11g have been reduced to seconds. This section describes best practices for both switchover and failover.
A database switchover performed by Oracle Data Guard is a planned transition that includes a series of steps to switch roles between a standby database and a primary database. Following a successful switchover operation, the standby database assumes the primary role and the primary database becomes a standby database. Switchovers are typically completed in only seconds to minutesFoot 10 .
Data Guard enables you to change these roles dynamically by:
Using Oracle Enterprise Manager
Using the broker's DGMGRL command-line interface
Issuing SQL statements, as described in Section 22.214.171.124.1, "Using SQL*Plus for Data Guard Switchover to a Physical Standby Database" and Section 126.96.36.199.2, "Using SQL*Plus for Data Guard Switchover to a Logical Standby Database"
See Also:Oracle Data Guard Broker for information about using Enterprise Manager or the broker's DGMGRL command-line interface to perform database switchover
To optimize switchover processing, perform the following best practices before performing a switchover:
Disconnect all sessions possible using the
SESSION SQL*Plus command.
Stop job processing by setting the
AQ_TM_PROCESSES parameter to
Cancel any specified apply delay by using the
NODELAY keyword to stop and restart log apply services on the standby database.
On a physical standby database:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY;
On a logical standby database:
ALTER DATABASE STOP LOGICAL STANDBY APPLY; ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE NODELAY;
You can view the current delay setting on the primary database by querying the
DELAY_MINS column of the
For logical standby databases:
Determine an optimal SQL Apply rate using the "SQL Apply Best Practices" white paper at
When performing a switchover using SQL*Plus statements, you should first build the LogMiner Data Dictionary by issuing the
ALTER DATABASE PREPARE TO SWITCHOVER SQL*Plus statement.
Verify the LogMiner Data Dictionary was received by the primary database by querying the
SWITCHOVER_STATUS column of the
V$DATABASE fixed view on the primary database. When the query returns the
TO LOGICAL STANDBY value, you can proceed with the switchover. See the discussion about "Switchovers Involving a Logical Standby Database" in Oracle Data Guard Concepts and Administration
For physical standby databases in an Oracle RAC environment, ensure there is only one instance active for each primary and standby database. (A switchover involving a logical standby database does not require that only one instance is active.)
For the fastest switchover, use real-time apply so that redo data is applied to the standby database as soon as it is received, and the standby database is synchronized with the primary database before the switchover operation to minimize switchover time. To enable real-time apply:
For a physical standby database, use the following SQL*Plus statement:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT USING CURRENT LOGFILE;
For a logical standby database, use the following SQL*Plus statement:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
For a physical standby database, reduce the number of archiver (ARCn) processes to the minimum needed for both remote and local archiving. Additional archiver processes can take additional time to shut down, thereby increasing the overall time it takes to perform a switchover. After the switchover has completed you can reenable the additional archiver processes.
As part of a switchover, the standby database must clear the online redo log files on the standby database before opening as a primary database. The time needed to complete the I/O can significantly increase the overall switchover time. By setting the
LOG_FILE_NAME_CONVERT parameter, the standby database can pre-create the online redo logs the first time the MRP process is started. You can also pre-create empty online redo logs by issuing the SQL*Plus
ALTER DATABASE CLEAR LOGFILE statement on the standby database.
See Also:The MAA white paper "Switchover and Failover Best Practices" at
A failover is typically used only when the primary database becomes unavailable, and there is no possibility of restoring it to service within a reasonable period. During a failover the primary database is taken offline at one site and a standby database is brought online as the primary database.
With Data Guard the process of failover can be completely automated using fast-start failover, or it can be a manual, user driven process. Oracle recommends using fast-start failover to eliminate the uncertainty inherent in a process that requires manual intervention. It automatically executes a failover within seconds of an outage being detected.
There are two distinct types of failover: manual failover and fast-start failover. An administrator initiates manual failover when the primary database fails. In contrast, Data Guard automatically initiates a fast-start failover without human intervention after the primary database has been unavailable for a set period (fast-start failover threshold).
Table 2-6 contrasts the characteristics of fast-start failover and manual failover.
|Points of Comparison||Fast-Start Failover||Manual Failover|
Allows you to increase availability with less need for manual intervention, thereby reducing management costs.
Gives you control over exactly when a failover occurs and to which target standby database.
The following conditions automatically trigger a fast-start failover:
A manual failover is user initiated and involves performing a series of steps to convert a standby database into a primary database. A manual failover should be performed due to an unplanned outage such as:
Use the following tools to manage fast-start failover failovers:
Use the following tools to perform manual failovers:
Restoring the original primary database after failover
Following a fast-start failover, the broker can automatically reconfigure the original primary database as a standby database upon reconnection to the configuration (
After manual failover, you must reinstate the original primary database as a standby database to restore fault tolerance.
Restoring bystander standby databases after failover
The broker coordinates the role transition on all databases in the configuration.
Bystanders that do no require reinstatement are available as viable standby databases to the new primary. Bystanders that require reinstatement are automatically reinstated by the observer.
A benefit of using the broker is that it provides the status of bystander databases and indicates whether a database must be reinstated. Status information is not readily available when using SQL*Plus statements to manage failover.
The broker automatically publishes FAN/AQ (Advanced Queuing) notifications after a failover. Clients that are also configured for Fast Connection Failover can use these notifications to connect to the new primary database. You can also use the
To configure fast client failover so applications can be available to the business after a failover, see Section 2.9, "Configuring Fast Connection Failover". You can also configure clients for Fast Connection Failover after a manual failover.
To optimize failover processing, use the following best practices:
Use fast-start failover
The MAA tests running Oracle Database 11g show that failovers performed using the broker and fast-start failover offer a significant improvement in availability. For a comprehensive review of Oracle Data Guard failover best practices, see:
"Data Guard Fast-Start Failover" MAA white paper at
Enable Flashback Database to reinstate the failed primary databases after a failover operation has completed. Flashback Database facilitates fast point-in-time recovery, if needed.
Use real-time apply with Flashback Database to apply redo data to the standby database as soon as it is received, and to quickly rewind the database should user error or logical corruption be detected.
Consider configuring multiple standby databases to maintain data protection following a failover.
For logical standby databases, see the MAA white paper "SQL Apply Best Practices" to obtain an optimal SQL Apply rate.
For physical standby databases:
See the MAA white paper "Oracle Data Guard Redo Apply and Media Recovery" to optimize media recovery for Redo Apply on the MAA Web site at
Go directly to the
OPEN state from the
MOUNTED state instead of restarting the standby database (as required in previous releases).
When transitioning from read-only mode to Redo Apply (recovery) mode, restart the database.
LOG_FILE_NAME_CONVERT parameter. As part of a failover, the standby database must clear its online redo logs before opening as the primary database. The time needed to complete this I/O can add significantly to the overall failover time. By setting the
LOG_FILE_NAME_CONVERT parameter, the standby pre-creates the online redo logs the first time the MRP process is started. You can also pre-create empty online redo logs by issuing the SQL*Plus
ALTER DATABASE CLEAR LOGFILE statement on the standby database.
Fast-start failover automatically, quickly, and reliably fails over to a designated standby database in the event of loss of the primary database, without requiring manual intervention to execute the failover. You can use fast-start failover only in an Oracle Data Guard configuration that is managed by the broker
The Oracle Data Guard configuration can be running in either the maximum availability or maximum performance mode with fast-start failover. When fast-start failover is enabled, the broker ensures fast-start failover is possible only when the configured data loss guarantee can be upheld. Maximum availability mode provides an automatic failover environment guaranteed to lose no data. Maximum performance mode provides an automatic failover environment guaranteed to lose no more than the amount of data (in seconds) specified by the
FastStartFailoverLagLimit configuration property.
Use the following fast-start failover best practices in addition to the generic best practices listed in the "General Best Practices for Failovers" section:
Run the fast-start failover observer process on a host that is not located in the same data center as the primary or standby database.
Ideally, you should run the observer on a system that is equally distant from the primary and standby databases. The observer should connect to the primary and standby databases using the same network as any end-user client. If the designated observer fails, Enterprise Manager can detect it and automatically restart the observer. If the observer cannot run at a third site, then you should install the observer on the same network as the application. If a third, independent location is not available, then locate the observer in the standby data center on a separate host and isolate the observer as much as possible from failures affecting the standby database.
Make the observer highly available by using Oracle Enterprise Manager to configure the original primary database to be automatically reinstated as a standby database when a connection to the database is reestablished. Also, Enterprise Manager enables you to define an alternate host on which to restart the observer.
After the failover completes, the original primary database is automatically reinstated as a standby database when a connection to it is reestablished, if you set the
FastStartFailoverAutoReinstate configuration property to
Set the value of the
FastStartFailoverThreshold property according to your configuration characteristics, as described in Table 2-7.
|Configuration||Minimum Recommended Setting|
Single-instance primary, low latency, and a reliable network
Single-instance primary and a high latency network over WAN
Oracle RAC primary
Reconfiguration time + 30 seconds Foot 1
Footnote 1 For configurations running Oracle Database software earlier than release 10.2.0.3, calculate a minimum reconfiguration time using this equation: Oracle RAC miscount + reconfiguration time + 30 seconds
Test your configuration using the settings shown in Table 2-7 to ensure that the fast-start failover threshold is not so aggressive that it induces false failovers, or so high it does not meet your failover requirements.
You should perform a manual failover, which is user-driven, only in case of an emergency and the failover should be initiated due to an unplanned outage such as:
Site disaster that results in the primary database becoming unavailable
User errors that cannot be repaired in a timely fashion
Data failures, to include widespread corruption, which affects the production application
Use the following manual failover best practices in addition to the generic best practices listed in the "General Best Practices for Failovers" section:
Reinstate the original primary database as a standby database to restore fault tolerance to your environment. The standby database can be quickly reinstated by using Flashback Database. See Section 4.3.2, "Restoring a Standby Database After a Failover".
For manual failovers that involve Oracle RAC, issue the
SHUTDOWN ABORT statement on all secondary Oracle RAC instances on the standby database before performing a failover.
For physical standby databases see the MAA white paper "Oracle Data Guard Redo Apply and Media Recovery" at
Beginning with Oracle Database release 11g, you can convert a physical standby database into a fully updatable standby database called a snapshot standby database.
To convert a physical standby database into a snapshot standby database, issue the SQL*Plus
ALTER DATABASE CONVERT TO SNAPSHOT STANDBY statement. This command causes Oracle Data Guard to perform the following actions:
Recover all available redo data
Create a guaranteed restore point
Activate the standby database as a primary database
Open the database as a snapshot standby database
To convert the snapshot standby back to a physical standby, issue the
ALTER DATABASE CONVERT TO PHYSICAL STANDBY statement. This command causes the physical standby database to be flashed back to the guaranteed restore point that was created before the
ALTER DATABASE CONVERT TO SNAPSHOT STANDBY statement was issued. Then, you must perform the following actions:
Restart the physical standby database
Restart Redo Apply on the physical standby database
Follow these best practices when creating and managing snapshot standby databases:
Automate the conversion steps using the broker to expedite the process of reverting a snapshot standby database to the physical standby database role. This is helpful because you can use a snapshot standby database for a switchover or failover only by first reverting it to a physical standby.
Create multiple standby databases if your business requires a fast recovery time objective (RTO).
Ensure the physical standby database that you convert to a snapshot standby is caught up with the primary database, or has a minimal apply lag. See Section 188.8.131.52, "Redo Apply Best Practices for Physical Standby Databases" for information about tuning media recovery.
Configure a flash recovery area and ensure there is sufficient I/O bandwidth available. This is necessary because snapshot standby databases use guaranteed restore points.
See Also:Oracle Data Guard Concepts and Administration for complete information about creating a snapshot standby database
The Oracle Database High Availability Overview describes how a multiple standby database architecture is virtually identical to that of single standby database architectures. Therefore, the configuration guidelines for implementing multiple standby databases described in this section complement the existing best practices for physical and logical standby databases.
When deploying multiple standby databases, use the following best practices:
Use the broker (described in Chapter 3, "Monitoring Using Oracle Grid Control") to manage your configuration and perform role transitions. However, if you choose to use SQL*Plus statements, see the MAA white paper "Multiple Standby Databases Best Practices" for best practices.
Use Flashback Database to quickly reinstate the original primary as the standby after a failover instead of re-creating the entire standby database from backups or from the primary database. Set the
DB_FLASHBACK_RETENTION_TARGET initialization parameter to the same value on all databases in the configuration. If you are using Flashback Database for the sole purpose of reinstating databases following a failover, a
DB_FLASHBACK_RETENTION_TARGET of 60 minutes is the minimum recommended value.
Enable supplemental logging in configurations containing logical standby databases. When creating a configuration with both physical and logical standby databases, issue the
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA statement to enable supplemental logging in the following situations:
When adding a logical standby database to an existing configuration consisting of all physical standby databases, you must enable supplemental logging on all existing physical standby databases in the configuration.
When adding a physical standby database to an existing configuration that contains a logical standby database, you must enable supplemental logging on the physical standby database when you create it.
Supplemental logging is enabled on the primary database when the logical standby dictionary is built. Enabling supplemental logging is a control file change and therefore the change is not propagated to each physical standby database. Supplemental logging is enabled automatically on a logical standby database when it is first converted from a physical standby database to a logical standby database as part of the dictionary build process.To enable supplemental logging, issue the following SQL*Plus statement when connected to a physical standby database:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY, UNIQUE INDEX) COLUMNS;
If logical standby databases are not configured to perform real-time queries, then consider configuring SQL Apply to delay applying redo data to the logical standby database. By delaying the application of redo, you can minimize the need to manually reinstate the logical standby database after failing over to a physical standby database.
To set a time delay, use the
DELAY=minutes attribute of the
n initialization parameter.
See Also:Oracle Database High Availability Overview to learn about the benefits of using multiple standby database and for implementation examples
If you have a license for the Oracle Active Data Guard option (available starting in Oracle Database 11g Release 1), then you can open a physical standby database for read-only access while Redo Apply on the standby database continues to apply redo data received from the primary database. All queries reading from the physical standby database execute in real time and return current results, providing more efficient use of system resources without compromising data protection or extending recovery time in the event a failover is required. Hence, this capability is referred to as real-time query.
This section summarizes the best practices for deploying real-time query:
Use the generic best practices described previously:
Tune the network following the guidelines in Section 2.6.5, "Redo Transport Services Best Practices"
Use real-time apply on the standby database so that changes are applied as soon as the redo data is received.
For configurations running Oracle Database 11g release 184.108.40.206, shut down the standby instance and Redo Apply cleanly so that upon restart you can open the standby database directly in read-only mode.
Configure clients for efficient failover:
Connect both the primary database and the reporting applications using an Oracle Net alias that contains all hosts (both primary and standby) in the
Connect the primary database and reporting applications with a role-specific service.
Reporting applications that connect to the standby database, and primary applications that connect to the primary database, must connect to the database with the correct database role.
Start and stop services based on the database role.
For example, the following automates starting and stopping the service using an after startup trigger that checks the database role and starts the appropriate service:
CREATE OR REPLACE TRIGGER manage_service after startup on database DECLARE role VARCHAR(30); BEGIN SELECT DATABASE_ROLE INTO role FROM V$DATABASE; IF role = 'PRIMARY' THEN DBMS_SERVICE.START_SERVICE('sales_rw'); ELSE DBMS_SERVICE.START_SERVICE('sales_ro'); END IF; END;
Monitor standby performance by using Standby Statspack.
See support note 454848.1 at
http://support.oracle.com/ for complete details about installing and using Standby Statspack.
CURRENT_SCN column of the
V$DATABASE view to monitor how far behind Redo Apply on the standby database is lagging behind the primary database. See Section 220.127.116.11, "Monitoring Real-Time Query" for more information.
See Also:The MAA white paper "Oracle Active Data Guard: Oracle Data Guard 11g Release 1" at
The following instructions describe how to enable real-time query on a standby database that is transactionally consistent with the primary database.
Note:For an inconsistent standby database, you must make the standby consistent with the primary database before opening the standby for read-only access. A step-by-step description is provided in the "Oracle Active Data Guard 11g Release 1" MAA white paper.
If the standby database instance and Redo Apply have been cleanly shut down, then you can open a standby database directly to the read-only state. When open, you can start Redo Apply to enable real-time query.
Perform the following steps to enable a consistent physical standby database:
Start the standby instance read-only.
It is unnecessary to include the
READ ONLY keyword (the default start-up state) on the
STARTUP command because the keyword is included implicitly when you issue an
OPEN command on a standby database.
Note: If the standby database is an Oracle RAC database, then you must first open one standby instance read-only and start Redo Apply before opening additional standby instances in read-only mode.
After the database is open, start Redo Apply:
SQL> RECOVER MANAGED STANDBY DATABASE DISCONNECT USING CURRENT LOGFILE;
If you are using the broker to manage the Data Guard configuration, then the default state for the standby database is
APPLY ON and Redo Apply starts automatically. If this default has been changed, then start Redo Apply by issuing the following command:
DGMGRL> EDIT DATABASE 'RTQ' SET STATE='APPLY-ON'
If the standby database is mounted and Redo Apply is running, stop Redo Apply, open the standby database for read-only access, and restart Redo Apply.
Real-time query allows you to query data as it is being applied to a standby database while guaranteeing a transactionally consistent view of the data. Oracle provides a read-consistent view of the data through a query SCN. The query SCN on the standby database is advanced by the recovery process after all dependent changes have been applied. As it is advanced, the new query SCN is propagated to all instances in an Oracle RAC standby. Once published to all standby instances, the query SCN is exposed to the user via the
CURRENT_SCN column of the
V$DATABASE view on the standby database.
The query SCN on the standby database is equivalent to the
CURRENT_SCN on the primary database. This allows applications connected to the standby database instances to use the query SCN as a snapshot of where the data is in relation to the primary database. Queries on the primary and standby databases return identical results as of a particular
CURRENT SCN.To determine how far behind the query results on the standby database are lagging the primary database, compare the
CURRENT SCN column on the primary database to the
CURRENT SCN on the standby database. For example, in a configuration for which there is a dblink on the primary database that points to a standby database called
RTQ_STBY, issue the following query on the primary database:
SQL> SELECT SCN_TO_TIMESTAMP((SELECT CURRENT_SCN FROM V$DATABASE)) -SCN_TO_TIMESTAMP((SELECT CURRENT_SCN FROM V$DATABASE@RTQ_STBY)) FROM DUAL;
The value returned from the query indicates the number of seconds that data on the standby database lags behind the current position of the primary database. To determine an approximate query lag on the standby without connecting to. or using, the primary database, use the
APPLY LAG metric from the
V$DATAGUARD_STATS view. Note that the
APPLY LAG metric is valid only for the time that it was computed and not at the moment the query against the
V$DATAGUAD_STATS view was issued. Use the
COMPUTED_TIME column to determine the last computed time for the apply lag.
See Also:The MAA white paper "Oracle Active Data Guard: Oracle Data Guard 11g Release 1" at
In a highly available environment, you must protect nondatabase files along with database files. Oracle Secure Backup provides data protection for heterogeneous UNIX, Linux, Windows, and Network Attached Storage (NAS) environments. Additionally, for disaster recovery purposes, some third-party tools enable remote synchronization between a set of local and remote files. For example, you can use tools such as
DRDB for remote synchronization. These tools are available for download on the internet. The following list provides recommendations regarding these tools:
For software updates, use
rsync to synchronize the standby system with the changes made to software on the primary system.
For configuration files, use
rsync daily or after a change, or use
For important log files, trace files, or debugging files, use
rsync daily or hourly, or use
DRDB to synchronize the entire file system. However, do not overwrite the existing standby log or trace directories.
For transaction logs or metadata files that must be synchronized with the database, use
csync2 frequently, or use a block synchronization tool such as
DRDB, a third-party mirroring utility, or remote synchronization tool.
To accurately assess the primary database performance after adding Data Guard standby databases, obtain a history of statistics from the
V$SYSMETRIC_SUMMARY view or Automatic Workload Repository (AWR) snapshots before and after deploying Oracle Data Guard with the same application profile and load.
To assess the application profile, compare the following statistics:
Physical reads per transaction
Physical writes per transaction
CPU usage per transaction
Redo generated per transaction
To assess the application performance, compare the following statistics:
Redo generated per second or redo rate
User commits per second or transactions per second
Database time per second
Response time per transaction
SQL service response time
If the application profile has changed between the two scenarios, then this is not a fair comparison. Repeat the test or tune the database or system with the general principles outlined in the Oracle Database Performance Tuning Guide.
If the application profile is similar and you observe application performance changes on the primary database because of a decrease in throughput or an increase in response time, then assess these common problem areas:
If you are experiencing high load (excessive CPU usage of over 90%, paging and swapping), then tune the system before proceeding with Data Guard. Use the
V$OSSTAT view or the
V$SYSMETRIC_HISTORY view to monitor system usage statistics from the operating system.
Higher I/O wait events
If you are experiencing higher I/O waits from the log writer or database writer processes, then the slower I/O effects throughput and response time. To observe the I/O effects, look at the historical data of the following wait events:
Log file parallel writes
Log file sequential reads
Log file parallel reads
Data file parallel writes
Data file sequential reads parallel writes
SYNC transport, commits take more time because of the need to guarantee that the redo data is available on the standby database before foreground processes get an acknowledgment from the log writer (LGWR) background process that the commit has completed. A LGWR process commit includes the following wait events:
Log File Parallel Write (local write for the LGWR process)
LGWR wait on SENDREQ
This wait event includes:
Time to put the packet into the network
Time to send the packet to the standby database
RFS write or standby write to the standby redo log, which includes the RFS I/O wait event plus additional overhead for checksums
Time to send a network acknowledgment back to the primary database (for example, single trip latency time)
Longer commit times for the LGWR process can cause longer response time and lower throughput, especially for small time-sensitive transactions. However, you may obtain sufficient gains by tuning the log writer local write (
Log File Parallel Write wait event) or the different components that comprise the
LGWR wait on SENDREQ wait event.
To tune the disk write I/O (
Log File Parallel Write or the RFS I/O), add more spindles or increase the I/O bandwidth.
To reduce the network time:
Tune the Oracle Net send and receive buffer sizes
Increase the network bandwidth if there is saturation
Possibly find a closer site to reduce the network latency
ASYNC transport, the LGWR process never waits for the network server processes to return before writing a
COMMIT record to the current log file. However, if the network server processes has fallen behind and the redo to be shipped has been flushed from the log buffer, then the network server process reads from the online redo logs. This causes more I/O contention and possibly longer wait times for the log writer process writes (
Log File Parallel Write). If I/O bandwidth and sufficient spindles are not allocated, then the log file parallel writes and log file sequential reads increase, which may affect throughput and response time. In most cases, adding sufficient spindles reduces the I/O latency.
Note:To enable most of the statistical gathering and advisors, ensure the
STATISTICS_LEVELinitialization parameter is set to
Footnote LegendFootnote 8: You can use Extended Datatype Support (EDS) to accommodate several more advanced data types. See the MAA white paper "Extended Datatype Support: SQL Apply and Streams" at
http://www.oracle.com/technology/deploy/availability/pdf/maa_edtsoverview.pdffor more details.