|Oracle® Database High Availability Best Practices
11g Release 1 (11.1)
|PDF · Mobi · ePub|
The best practices discussed in this section apply to Oracle Database release 11g architectures that are described in Oracle Database High Availability Overview:
Oracle Database with Oracle Clusterware
Oracle Database with Oracle Real Application Clusters (Oracle RAC)
Oracle Database with Oracle RAC on Extended Clusters
Oracle Database with Oracle Data Guard
Oracle Database with Oracle Clusterware and Data Guard
Oracle Database with Oracle RAC and Oracle Data Guard
Oracle Database with Oracle Streams
The recommendations described in this section are identical for both the primary and standby databases in Oracle Data Guard configurations. It is necessary to adopt these practices to reduce or avoid outages, reduce risk of corruption, and improve recovery performance.
This section contains general best practices for configuring the database:
Use the following best practices to reduce recovery time and increase database availability and redundancy:
ARCHIVELOG mode enables online database backup and is necessary to recover the database to a point in time later than what has been restored. Features such as Oracle Data Guard and Flashback Database require that the production database run in
See Also:Oracle Database Administrator's Guide for more information about using automatic archiving
Use Oracle log multiplexing to create multiple redo log members in each redo group, one in the data area and one in the flash recovery area. This protects against a failure involving the redo log, such as a disk or I/O failure for one member, or a user error that accidentally removes a member through an operating system command. If at least one redo log member is available, then the instance can continue to function.
Note:Do not multiplex the standby redo logs.
All online redo log files should be the same size and configured to switch approximately once an hour during normal activity. They should not switch more frequently than every 20 minutes during peak activity.
There should be a minimum of four online log groups to prevent the logwriter process from waiting for a group to be available following a log switch. A group might be unavailable because a checkpoint has not yet completed or because the group has not yet been archived.
Oracle Database Administrator's Guide for more information about managing redo logs
Oracle Data Guard Concepts and Administration for more information about online, archived, and standby redo log files
The flash recovery area is Oracle managed disk space that provides a centralized disk location for backup and recovery files.
The flash recovery area is defined by setting the following database initialization parameters:
This parameter specifies the default location for the flash recovery area.
This parameter specifies (in bytes) the hard limit on the total space to be used by database recovery files created in the recovery area location.
The Oracle Suggested Backup Strategy described in the Oracle Database 2 Day DBA recommends using the flash recovery area as the primary location for recovery. When the flash recovery area is properly sized, files needed for repair are readily available. The minimum recommended disk limit is the combined size of the database, incremental backups, all archived redo logs that have not been copied to tape, and flashback logs.
Note:Do not configure the archived redo log files for Oracle Streams capture to reside solely in the flash recovery area. Instead, configure a separate log archive destination that is independent of the flash recovery area for the Oracle Streams capture process for the database.
This is necessary because the archived redo log files in the flash recovery area may be removed automatically due to lack of disk space or RMAN may remove logs that no longer meet the backup retention policies, even though the log files are still required by Oracle Streams.
See Also:Oracle Database Backup and Recovery User's Guide for detailed information about sizing the flash recovery area and setting the retention period
Flashback Database provides an efficient alternative to point-in-time recovery for reversing unwanted database changes. Flashback Database enables you to rewind an entire database backward in time, reversing the effects of database changes within a time window. The effects are similar to database point-in-time recovery (DBPITR). You can flash back a database by issuing a single RMAN command or a SQL*Plus statement instead of using a complex procedure.
When configuring and enabling Flashback Database:
Ensure that the production database is running in
Ensure there is sufficient I/O bandwidth available to the flash recovery area to maintain flashback write throughput.
During normal run-time activities, Flashback Database buffers and writes the before images of data blocks into the flashback logs that reside in the flash recovery area. If flashback writes are slow (as indicated by the
flashback free buffer waits wait event), then database throughput is affected. The amount of disk writes caused by Flashback Database varies depending on the workload and application profile. For a typical OLTP workload that is using a flash recovery area with sufficient disk spindles and I/O throughput, the overhead incurred by Flashback Database is less than two percent.
For large databases, set the
LOG_BUFFER initialization parameter to at least 8 MB to ensure the database allocates maximum memory (typically 16 MB) for writing Flashback Database logs.
Flashback Database can return a primary or standby database to a point in time before a role transition. In addition, you can use guaranteed restore points to flash back a database to a point in time before a
RESETLOGS operation, thus providing more flexibility in detecting and correcting human errors.
Flashback Database or guaranteed restore points are required when using:
Fast-start failover—Requires Flashback Database so that the broker can automatically reinstate the primary database following an automatic failover. Bystander standby databases that may be disabled after a failover can only be reinstated when Flashback Database is enabled.
Flashback Database is optional but recommended when performing rolling database upgrades. You should create a guaranteed restore point before performing an upgrade to fall back in case the upgrade fails. Using this method to restore the database to the pre-upgrade state is substantially quicker than using the downgrade procedure. However, note that flashing back the database to the pre-upgrade state is practical only when no application data changes have been made.
In general, the performance effect of enabling Flashback Database is minimal. However, there are some application profiles that may require special tuning or additional considerations. See support note 565535.1 at
http://support.oracle.com/ for additional Flashback Database considerations and specific application use cases.
There are several data analysis methods for monitoring the Flashback Database workload on your system such as using Automatic Workload Repository (AWR), or querying the
V$FLASHBACK_DATABASE_STAT or the
V$SYSSTAT views. For example, you can use AWR to compare AWR snapshots collected before and after the time that you enabled Flashback Database. You can also review AWR snapshots to pinpoint system usage caused by flashback logging. See the "Monitoring Flashback Database Performance Impact" section in the Oracle Database Backup and Recovery User's Guide for more monitoring and tuning techniques.
The following sections in Oracle Database Backup and Recovery User's Guide for more information about guaranteed restore points and Flashback Database:
The fast-start fault recovery feature reduces the time required to recover from a crash. It also makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.
With this feature, the
FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. This parameter specifies a target for the expected recovery time objective (RTO), which is the time (in seconds) that it should take to start the instance and perform cache recovery. When you set this parameter, the database manages incremental checkpoint writes in an attempt to meet the target. If you have chosen a practical value for this parameter, then you can expect your database to recover, on average, in approximately the number of seconds you have chosen.
Outage testing (such as for node or instance failures) during peak loads is recommended.
Oracle Database Backup and Recovery User's Guide for more information about fast-start fault recovery
The MAA white paper "Optimizing Availability During Unplanned Outages Using Oracle Clusterware and RAC" for more best practices
By default, Oracle always validates the data blocks that it reads from disk. You should also consider one or more of the following methods to provide additional prevention and detection against corruptions caused by underlying disks, storage systems, or the I/O system:
Use Data Guard and configure the
DB_ULTRA_SAFE initialization parameter on both the primary and standby systems for the most comprehensive data corruption prevention and detection.
On the primary database, set the
DB_ULTRA_SAFE=DATA_AND_INDEX initialization parameter to prevent and detect data corruptions in a timely manner, and thus provide critical data protection and high availability for the Oracle Database.
DB_ULTRA_SAFE initialization parameter also controls other data protection behavior in Oracle Database, such as requiring ASM to perform sequential mirror write I/Os.
Table 2-1 describes the values that the
DB_ULTRA_SAFE parameter automatically assigns to the
|When you set
Note:When you set the
DB_ULTRA_SAFEparameter, it automatically integrates and controls the behavior (described in Table 2-1) of the following initialization parameters:
DB_BLOCK_CHECKING detects and prevents data block corruptions.
Block checking prevents memory and data corruptions, but it incurs some performance overhead on every block change. For many applications, the block changes are a small percentage compared to the blocks read (typically less than five percent), so the overall effect of enabling block checking is small.
Redo and data block checksums detect corruptions on the primary database and protect the standby database. This parameter requires minimal CPU resources.
DB_LOST_WRITE_PROTECT detects stray and lost writes.
Lost write protection enables a physical standby database to detect lost write corruptions on both the primary and physical standby database.
However, if you explicitly set the
DB_LOST_WRITE_PROTECT parameters in the initialization parameter file, then the
DB_ULTRA_SAFE parameter has no effect and no changes are made to the parameter values. Thus, if you specify the
DB_ULTRA_SAFE parameter, do not explicitly set these underlying parameters.
On physical standby databases, specify the
DB_BLOCK_CHECKSUM is set to
FULL, then both disk corruption and in-memory corruption are detected and the block is not written to disk, thus preserving the integrity of the physical standby database. This parameter has minimal effect on Redo Apply performance.
Lost write protection prevents corruptions—due to stray or lost writes on the primary—from being propagated and applied to the standby database. Setting this parameter has a negligible effect on the standby database. Moreover, setting the
DB_LOST_WRITE_PROTECT initialization parameter is recommended over employing the HARD solution, because HARD does not provide full stray and lost write protection and redo application validation.
A standby database is a database that is decoupled from the primary database and on which redo data is checked and verified. Redo Apply and SQL Apply processes perform another layer of validation on the standby database that can detect stray or lost writes and corrupted blocks caused by hardware, software, or network issues. Most of these issues cannot be detected on the primary database or may remain hidden on the primary database for a long period.
DB_BLOCK_CHECKING initialization parameter.
Consider setting the
DB_BLOCK_CHECKING parameter only on the primary database. Enabling
DB_BLOCK_CHECKING on the standby database incurs a much higher overhead and can dramatically reduce Redo Apply performance. Testing is recommended to measure the effect on your environment.
Note:Although enabling the
DB_BLOCK_CHECKINGparameter is recommended, doing so can significantly reduce the throughput of Redo Apply processes by as much as 50%.
During MAA internal testing, Redo Apply throughput doubled in Oracle Database 11g and reached 100 MB/sec for batch workloads and 50 MB/sec for OLTP workloads. The throughput dropped by 50% after enabling the
DB_BLOCK_CHECKING parameter. However, for cases where the reduced throughput can still surpass peak redo rates on the primary database, enabling
DB_BLOCK_CHECKING parameter is still advised to provide additional data corruption protection.
Configure Data Recovery Advisor to quickly diagnose and repair data failures for non Oracle RAC primary databases. Data Recovery Advisor periodically scans for data corruptions. See "Use the Data Recovery Advisor".
Configure Oracle Recovery Manager (RMAN) to automate the backup and management of recovery-related files, calculate checksums when taking backups to ensure that all blocks being backed up are validated, and detect physical and logical corruptions. Periodically use the RMAN
BACKUP VALIDATE CHECK LOGICAL... scan to detect corruptions. See "Configuring Backup and Recovery".
Configure Oracle Secure Backup to integrate tape backup and management into your environment to provide local and remote data protection. See "Create Fast Tape Backups Using Oracle Secure Backup".
Use ASM redundancy for disk groups to provide mirrored extents that can be used by the database in the event an I/O error or corruption is encountered. For continued protection in the event of a failure, ASM redundancy provides the ability to move an extent to a different area on a disk if an I/O error occurs. The ASM redundancy mechanism is useful if you have some bad sectors returning media sense errors.
Use the Data Recovery Advisor for non-Oracle RAC primary databases to quickly diagnose data failures, determine and present appropriate repair options, and execute repairs at the user's request. Data Recovery Advisor reduces downtime by eliminating confusion and automating detection and repair. It can diagnose failures based on symptoms, such as:
Components that are not accessible because they do not exist, do not have the correct access permissions, are taken offline, and so on
Physical corruptions such as block checksum failures, invalid block header field values, and so on
Logical corruptions caused by software bugs
Incompatibility failures caused by an incorrect version of a component
I/O failures such as a limit on the number of open files exceeded, channels inaccessible, network or I/O errors, and so on
Configuration errors such as an incorrect initialization parameter value that prevents the opening of the database
If failures are diagnosed, then they are recorded in the Automatic Diagnostic Repository (ADR). Data Recovery Advisor intelligently determines recovery strategies by:
Generating repair advice and repairing failures only after failures have been detected by the database and stored in ADR
Aggregating failures for efficient recovery
Presenting only feasible recovery options
Indicating any data loss for each option
Typically, Data Recovery Advisor presents both automated and manual repair options. If appropriate, you can choose to have Data Recovery Advisor automatically perform a repair, verify the repair success, and close the relevant repaired failures.
See Also:The chapter about diagnosing and repairing failures with Data Recovery Advisor in the Oracle Database Backup and Recovery User's Guide
Under most circumstances, Oracle Database automatically detects if asynchronous I/O is available and appropriate for a particular platform, and enables asynchronous I/O through the
DISK_ASYNCH_IO initialization parameter. However, for optimal performance, it is always a best practice to ensure that asynchronous I/O is actually being used. Query the
V$IOSTAT_FILE view to determine whether asynchronous I/O is used:
SQL> select file_no,filetype_name,asynch_io from v$iostat_file;
To explicitly enable asynchronous I/O, set the
DISK_ASYNCH_IO initialization parameter to
ALTER SYSTEM SET DISK_ASYNCH_IO=TRUE SCOPE=SPFILE SID='*';
Note that if you are using ASM, it performs I/O asynchronously by default.
For large production databases, set the
LOG_BUFFER initialization parameter to a minimum of 8 MB. This setting ensures the database allocates maximum memory for writing Flashback Database logs. If the database is configured to transport redo data to a standby database asynchronously, then you should size the
LOG_BUFFER parameter large enough to accommodate the processes involved in the network send.
Automatic Shared Memory Management (ASMM) to improve memory management. By setting the
SGA_TARGET parameter to a nonzero value, the shared pool, large pool, Java pool, streams pool, and buffer cache are automatically and dynamically resized, as needed. See the Oracle Database Administrator's Guide for more information.
When the value of
RECOVERY_ESTIMATED_IOS in the
V$INSTANCE_RECOVERY view is small (for example, < 5000), then the overhead of parallel recovery may outweigh any benefit. This typically occurs with a very aggressive setting of
FAST_START_MTTR_TARGET. In this case, set
RECOVERY_PARALLELISM to 1 to disable parallel recovery.
Use the following best practices to improve Oracle Database manageability:
Data Recovery Advisor automatically diagnoses data failures, determines and presents appropriate repair options, and executes repairs at the user's request. In this context, a data failure is a corruption or loss of persistent data on disk. By providing a centralized tool for automated data repair, Data Recovery Advisor improves the manageability and reliability of an Oracle database and thus helps reduce the MTTR.
Note:In the current release, Data Recovery Advisor only supports single-instance databases. Oracle RAC and Oracle Data Guard databases are not supported.
See Also:The chapter about "Diagnosing and Repairing Failures with Data Recovery Advisor" in the Oracle Database Backup and Recovery User's Guide
Effective data collection and analysis is essential for identifying and correcting performance problems. Oracle provides several tools that gather information regarding database performance.
The Oracle Database automatic performance tuning features include:
Set the AWR automatic snapshot interval to 10-20 minutes to capture performance peaks during stress testing or to diagnose performance issues.
Under usual workloads a 60-minute interval is sufficient.
The server parameter file (SPFILE) enables a single, central parameter file to hold all database initialization parameters associated with all instances of a database. This provides a simple, persistent, and robust environment for managing database parameters. An SPFILE is required when using the broker.
Oracle Database Administrator's Guide for information about managing initialization parameters with an SPFILE
Oracle Real Application Clusters Administration and Deployment Guide for information on initialization parameters with Real Application Clusters
Oracle Data Guard Broker for information on other prerequisites for using the broker
With automatic undo management, the Oracle Database server effectively and efficiently manages undo space, leading to lower administrative complexity and cost. When Oracle Database internally manages undo segments, undo block and consistent read contention are eliminated because the size and number of undo segments are automatically adjusted to meet the current workload requirement.
To use automatic undo management, set the following initialization parameters:
Set this parameter to
Specify the desired time in seconds to retain undo data. Set this parameter to the same value on all instances.
Specify a unique undo tablespace for each instance.
Advanced object recovery features, such as Flashback Query, Flashback Version Query, Flashback Transaction Query, and Flashback Table, require automatic undo management. The success of these features depends on the availability of undo information to view data as of a previous point in time.
By default, Oracle Database automatically tunes undo retention by collecting database usage statistics and estimating undo capacity needs. Unless you enable retention guarantee for the undo tablespace (by specifying the
RETENTION GUARANTEE clause on either the
CREATE DATABASE or the
CREATE UNDO TABLESPACE statement), Oracle Database may reduce the undo retention below the specified
Note:By default, ongoing transactions can overwrite undo data even if the
UNDO_RETENTIONparameter setting specifies that the undo data should be maintained. To guarantee that unexpired undo data is not overwritten, you must enable
RETENTION GUARANTEEfor the undo tablespace.
If there is a requirement to use Flashback technology features, the best practice recommendations is to enable
RETENTION GUARANTEE for the undo tablespace and set a value for
UNDO_RETENTION based on the following guidelines:
Establish how long it would take to detect when erroneous transactions have been carried out. Multiply this value by two.
Use the Undo Advisor to compute the minimum undo tablespace size based on setting
UNDO_RETENTION to the value recommended in step 1.
If the undo tablespace has the
AUTOEXTEND option disabled, allocate enough space as determined in step 2 or reduce the value of the
If the undo tablespace has the
AUTOEXTEND option enabled, make sure there is sufficient disk space available to extend the datafiles to the size determined in step 2. Make sure the autoextend
MAXSIZE value you specified is large enough.
See Also:The section about "Computing the Minimum Undo Tablespace Size Using the Undo Advisor" in Oracle Database 2 Day DBA
RETENTION GUARANTEE option, if the tablespace is configured with less space than the transaction throughput requires, then the following sequence of events occurs:
If you have an autoextensible file, then the file automatically grows to accommodate the retained undo data.
A warning alert reports the disk is at 85% full.
A critical alert reports the disk is at 97% full.
Transactions receive an out-of-space error.
See Also:Oracle Database Administrator's Guide for more information about the
UNDO_RETENTIONsetting and the size of the undo tablespace
Locally managed tablespaces perform better than dictionary-managed tablespaces, are easier to manage, and eliminate space fragmentation concerns. Locally managed tablespaces use bitmaps stored in the data file headers and, unlike dictionary managed tablespaces, do not contend for centrally managed resources for space allocations and de-allocations.
See Also:Oracle Database Administrator's Guide for more information about locally managed tablespaces
Automatic segment space management simplifies space administration tasks, thus reducing the chance of human error. An added benefit is the elimination of performance tuning related to space management. It facilitates management of free space within objects such as tables or indexes, improves space utilization, and provides significantly better performance and scalability with simplified administration. The automatic segment space management feature is enabled by default for all tablespaces created using default attributes.
See Also:Oracle Database Administrator's Guide for more information on segment space management
Temporary tablespaces improve the concurrency of multiple sort operations, reduce sort operation overhead, and avoid data dictionary space management operations. This is a more efficient way of handling temporary segments, from the perspective of both system resource usage and database performance.
The best practice is to specify a default temporary tablespace for the entire database to ensure that temporary segments are used for the most efficient sort operations, whether individual users have been assigned a temporary tablespace.
|To Specify a Default Temporary Tablespace ...||Then ...|
|When creating the database ...||Use the
|After database creation ...||Use the
Using the default temporary tablespace ensures that all disk sorting occurs in a temporary tablespace and that other tablespaces are not mistakenly used for sorting.
See Also:Oracle Database Administrator's Guide for more information about managing tablespaces
Resumable space allocation provides a way to suspend and later resume database operations if there are space allocation failures. The affected operation is suspended instead of the database returning an error. No processes must be restarted. When the space problem is resolved, the suspended operation is automatically resumed.
To use resumable space allocation, you can set it at the system level with the
RESUMABLE_TIMEOUT initialization parameter, or enable it at the session level using clauses of the
ALTER SESSION statement (for example, issue the
RESUMABLE statement). The default for a new session is resumable mode disabled, unless you explicitly set the
RESUMABLE_TIMEOUT initialization parameter to a nonzero value.
See Also:Oracle Database Administrator's Guide for more information about managing resumable space allocation
The Database Resource Manager gives database administrators more control over resource management decisions, so that resource allocation can be aligned with the business objectives of an enterprise. The Database Resource Manager provides the ability to prioritize work within the Oracle Database server. Availability of the database encompasses both its functionality and performance. If the database is available but users are not getting the level of performance they need, then availability and service level objectives are not being met. Application performance, to a large extent, is affected by how resources are distributed among the applications that access the database. The main goal of the Database Resource Manager is to give the Oracle Database server more control over resource management decisions, thus circumventing problems resulting from inefficient operating system management and operating system resource managers.
See Also:Oracle Database Administrator's Guide for more information about Database Resource Manager