Oracle9i Database Performance Guide and Reference
Release 1 (9.0.1)

Part Number A87503-02
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

17
Configuring Instance Recovery Performance

This chapter offers guidelines for configuring the time to perform instance recovery.

This chapter contains the following sections:

Understanding Instance Recovery

Instance and crash recovery are the automatic application of redo log records to Oracle data blocks after a crash or system failure.

During normal operation, if an instance is shutdown cleanly (for example, using a SHUTDOWN IMMEDIATE statement), rather than terminated abnormally, then the in-memory changes that have not already been written to the datafiles on disk are written to disk as part of the checkpoint performed during shutdown.

However, if a single instance database crashes (or if all instances of an Oracle Real Application Cluster configuration crash), then Oracle performs crash recovery at the next startup. If one or more instances of an Oracle Real Application Cluster configuration crash, then a surviving instance performs instance recovery automatically.

Instance and crash recovery occur in two steps: cache recovery then transaction recovery.

Cache Recovery (Rolling Forward)

During the cache recovery step, Oracle applies all committed and uncommitted changes in the redo log files to the affected data blocks. The work required for cache recovery processing is proportional to the rate of change to the database (update transactions per second) and the time between checkpoints.

Transaction Recovery (Rolling Back)

To make the database consistent, the changes that were not committed at the time of the crash must be undone (in other words, rolled back). During the transaction recovery step, Oracle applies the rollback segments to undo the uncommitted changes. The work required to do transaction recovery is proportional to the number and size of uncommitted transactions when the system fault occurred.

Checkpointing and Cache Recovery

Periodically, Oracle records a checkpoint. A checkpoint is the highest system change number (SCN) such that all data blocks below or at that SCN are know to be written out to the data files. If a failure occurs, then only the redo records containing changes at SCNs higher than the checkpoint need to be applied during recovery. The duration of cache recovery processing is determined by two factors: the number of data blocks that have changes at SCNs higher than the SCN of the checkpoint, and the number of log blocks that need to be read to find those changes.

How Checkpoints Affect Performance

Frequent checkpointing writes dirty buffers to the datafiles more often than otherwise, and so reduces cache recovery time in the event of an instance failure. If checkpointing is frequent, then applying the redo records in the redo log between the current checkpoint position and the end of the log involves processing relatively few data blocks. This means that the cache recovery phase of recovery is fairly short.

However, in a high-update system, frequent checkpointing can reduce runtime performance, because checkpointing causes DBWn processes to perform writes.

Fast Instance Recovery Trade-offs

To minimize the duration of instance recovery, you must force Oracle to checkpoint often, thus keeping the number of redo log records to be applied during recovery to a minimum. However, frequent checkpointing increases the overhead for normal database operations.

If daily operational efficiency is more important than minimizing recovery time, then decrease the frequency of writes to data files due to checkpoints. This should improve operational efficiency, but also increase instance recovery time.

See Also:

 

Reducing Checkpoint Frequency to Optimize Runtime Performance

To reduce the checkpoint frequency and optimize runtime performance, you can do the following:

Configuring the Duration of Cache Recovery

There are several methods for tuning cache recovery to keep the duration of recovery within user-specified bounds. These include the following:

Use Fast-Start Checkpointing to Limit Instance Recovery Time

The Oracle 9i Enterprise Edition offers fast-start fault recovery functionality to control instance recovery. This reduces the time required for cache recovery and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.

The foundation of fast-start recovery is the fast-start checkpointing architecture. Instead of the conventional event driven (that is, log switching) checkpointing, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpoint advance. Fast-start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.

Administrators specify a target (bounded) time to complete the cache recovery phase of recovery with the FAST_START_MTTR_TARGET initialization parameter, and Oracle automatically varies the incremental checkpoint writes to meet that target. The FAST_START_MTTR_TARGET initialization parameter lets you specify in seconds the expected "mean time to recover" (MTTR), which is the expected amount of time Oracle takes to perform crash/instance recovery for a single instance.

FAST_START_MTTR_TARGET

The FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. This parameter lets you specify the number of seconds crash recovery is expected to take. The FAST_START_MTTR_TARGET is internally converted to a set of parameters that modify the operation of Oracle such that recovery time is as close to this estimate as possible.


Note:

You should disable or remove the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT parameters when using FAST_START_MTTR_TARGET. Setting these parameters to active values interferes with FAST_START_MTTR_TARGET, resulting in a different than expected value V$INSTANCE_RECOVERY.TARGET_MTTR


The maximum value for FAST_START_MTTR_TARGET is 3600, or one hour. If you set the value to more than 3600, then Oracle rounds it to 3600. There is no minimum value for FAST_START_MTTR_TARGET. However, this does not mean that you can target the recovery time as low as you want. The time to do a crash recovery is limited by the low limit of the target number of dirty buffers, which is 1000, as well as factors such as how long initialization and file open take.

If you set the value of FAST_START_MTTR_TARGET too low, then the effective mean time to recover (MTTR) target will be the best MTTR target the system can achieve. If you set the value of FAST_START_MTTR_TARGET to such a high value that even in the worst-case recovery would not take that long, then the effective MTTR target will be the estimated MTTR in the worst-case scenario (when the whole buffer cache is dirty). Use the TARGET_MTTR column in V$INSTANCE_RECOVERY to see the effective MTTR.


Note:

The TARGET_MTTR column in V$INSTANCE_RECOVERY could be different than FAST_START_MTTR_TARGET if the latter is set too low or too high. Periodically check the MTTR_TARGET column in the V$INSTANCE_RECOVERY view and compare it with the parameter setting. Adjust the parameter setting if it is consistently different from the value in the target.  



See Also:

"Monitoring Estimated MTTR: Example Scenario" for more information on setting FAST_START_MTTR_TARGET 

Set LOG_CHECKPOINT_TIMEOUT to Influence the Number of Redo Logs

Set the initialization parameter LOG_CHECKPOINT_TIMEOUT to a value n (where n is an integer) to require that the latest checkpoint position follow the most recent redo block by no more than n seconds. In other words, at most, n seconds worth of logging activity can occur between the most recent checkpoint position and the end of the redo log. This forces the checkpoint position to keep pace with the most recent redo block

You can also interpret LOG_CHECKPOINT_TIMEOUT as specifying an upper bound on the time a buffer can be dirty in the cache before DBWn must write it to disk. For example, if you set LOG_CHECKPOINT_TIMEOUT to 60, then no buffers remain dirty in the cache for more than 60 seconds. The default value for LOG_CHECKPOINT_TIMEOUT is 1800, or 30 minutes.

Set LOG_CHECKPOINT_INTERVAL to Influence the Number of Redo Logs

Set the initialization parameter LOG_CHECKPOINT_INTERVAL to a value n (where n is an integer) to require that the checkpoint position never follow the most recent redo block by more than n blocks. In other words, at most n redo blocks can exist between the checkpoint position and the last block written to the redo log. In effect, you are limiting the amount of redo blocks that can exist between the checkpoint and the end of the log.

Oracle limits the maximum value of LOG_CHECKPOINT_INTERVAL to 90% of the smallest log to ensure that the checkpoint advances into the current log before that log fills and a log switch is attempted.

LOG_CHECKPOINT_INTERVAL is specified in redo blocks. Redo blocks are the same size as operating system blocks. Use the LOG_FILE_SIZE_REDO_BLKS column in V$INSTANCE_RECOVERY to see the number of redo blocks corresponding to 90% of the size of the smallest log file.

See Also:

 

Use Parallel Recovery to Speed up Redo Application

Use parallel recovery to tune the cache recovery phase of recovery. Parallel recovery uses a division of labor approach to allocate different processes to different data blocks during the cache recovery phase of recovery. For example, during recovery the redo log is read, and blocks that require redo application are parsed out. These blocks are subsequently distributed evenly to all recovery processes to be read into the buffer cache. Crash, instance, and media recovery of datafiles on different disk drives are good candidates for parallel recovery.

Use the RECOVERY_PARALLELISM initialization parameter to specify the number of concurrent recovery processes for instance or crash recovery. To use parallel processing, the value of RECOVERY_PARALLELISM must be greater than 1 and cannot exceed the value of the PARALLEL_MAX_SERVERS initialization parameter.


Note:

The RECOVERY_PARALLELISM initialization parameter specifies the number of concurrent recovery processes for instance or crash recovery only.

Media recovery is not affected by this parameter. Use the PARALLEL clause in the RECOVER DATABASE statement for media recovery. 


Recovery is usually I/O bound on reads to data blocks. Consequently, parallelism at the block level can only help recovery performance if it speeds up total I/Os. Performance on systems with efficient asynchronous I/O typically does not improve significantly with parallel recovery.

Initialization Parameters that Influence Cache Recovery Time

The following initialization parameters influence cache recovery time.

Table 17-1 Initialization Parameters Influencing Cache Recovery
Parameter  Purpose 
FAST_START_MTTR_TARGET


FAST_START_IO_TARGET

 

Lets you specify in seconds the expected "mean time to recover" (MTTR), which is the expected amount of time Oracle takes to perform recovery and startup the instance.

This parameter has been deprecated in favour of FAST_START_MTTR_TARGET. This parameter specifies the upper limit on the number of dirty buffers. 

LOG_CHECKPOINT_TIMEOUT
 

Limits the number of seconds between the most recent redo record and the checkpoint.  

LOG_CHECKPOINT_INTERVAL

RECOVERY_PARALLELISM
 

Limits the number of redo blocks generated between the most recent redo record and the checkpoint.

Specifies the number of concurrent recovery processes to be used in instance or crash recovery. 


Note:

Oracle recommends using the FAST_START_MTTR_TARGET parameter to control the duration of startup after instance failure. Fast-start checkpointing is only available with Enterprise Edition.

The FAST_START_IO_TARGET parameter has been deprecated in favor of the FAST_START_MTTR_TARGET parameter.

The parameter DB_BLOCK_MAX_DIRTY_TARGET has been removed. 


Monitoring Cache Recovery

Use the V$INSTANCE_RECOVERY view to see the current recovery parameter settings. You can also use statistics from this view to calculate which parameter has the greatest influence on checkpointing. V$INSTANCE_RECOVERY contains the columns shown in Table 17-2.


Note:

The last three fields in V$INSTANCE_RECOVERY are new with Oracle9i, and they are the most important. With the initialization parameter FAST_START_MTTR_TARGET, the other seven fields of V$INSTANCE_RECOVERY are less useful.  


Table 17-2 V$INSTANCE_RECOVERY View
Column  Description 
RECOVERY_ESTIMATED_IOS
 

Contains the number of dirty buffers in the buffer cache. (In Standard Edition, the value of this field is always NULL).  

ACTUAL_REDO_BLKS
 

Current number of redo blocks required to be read for recovery. 

TARGET_REDO_BLKS
 

Goal for the maximum number of redo blocks to be processed during recovery. This value is the minimum of the next three columns (LOG_FILE_SIZE_REDO_BLKS, LOG_CHKPT_TIMEOUT_REDO_BLKS, LOG_CHKPT_INTERVAL_REDO_BLKS). 

LOG_FILE_SIZE_REDO_BLKS
 

Number of redo blocks to be processed during recovery corresponding to 90% of the size of the smallest log file. 

LOG_CHKPT_TIMEOUT_REDO_BLKS
 

Number of redo blocks that must be processed during recovery to satisfy LOG_CHECKPOINT_TIMEOUT

LOG_CHKPT_INTERVAL_REDO_BLKS
 

Number of redo blocks that must be processed during recovery to satisfy LOG_CHECKPOINT_INTERVAL

FAST_START_IO_TARGET_REDO_BLKS
 

This field is obsolete. It is retained for backward compatibility. The value of this field is always NULL.  

TARGET_MTTR
 

Effective mean time to recover (MTTR) target in seconds. Usually, it should be equal to the value of the FAST_START_MTTR_TARGET parameter. If FAST_START_MTTR_TARGET is set to such a small value that it is impossible to do a recovery within its time frame, then the TARGET_MTTR field contains the effective MTTR target, which is larger than FAST_START_MTTR_TARGET. If FAST_START_MTTR_TARGET is set to such a high value that even in the worst-case (the whole buffer cache is dirty) recovery would not take that long, then the TARGET_MTTR field contains the estimated MTTR in the worst-case scenario. This field is 0 if FAST_START_MTTR_TARGET is not specified. 

ESTIMATED_MTTR
 

The current estimated mean time to recover (MTTR) in the number of seconds based on the number of dirty buffers and log blocks (gives the current estimated MTTR even if FAST_START_MTTR_TARGET is not specified). 

CKPT_BLOCK_WRITES
 

Number of blocks written by checkpoint writes. 

See Also:

Oracle9i Database Reference for more information on the V$INSTANCE_RECOVERY view 

Monitoring Estimated MTTR: Example Scenario

The TARGET_MTTR field of V$INSTANCE_RECOVERY contains the MTTR target in effect. The ESTIMATED_MTTR field of V$INSTANCE_RECOVERY contains the estimated MTTR should a crash happen right away. Query these two fields to see if the system can keep up with your specified MTTR target.

For example, assume the initialization parameter setting is as follows:

FAST_START_MTTR_TARGET = 6      # seconds

Execute the following query after database open:

SELECT TARGET_MTTR, ESTIMATED_MTTR, CKPT_BLOCK_WRITES 
FROM V$INSTANCE_RECOVERY;

Oracle responds with the following:

TARGET_MTTR ESTIMATED_MTTR CKPT_BLOCK_WRITES
18          15             0

You see that TARGET_MTTR is 18 seconds, which is higher than the value of FAST_START_MTTR_TARGET specified (6 seconds). This means that it is impossible to recover the database within 6 seconds. 18 seconds is the minimum MTTR target that the system can achieve.

The 18 second minimum is calculated based on the absolute low limit of 1000 blocks on the target of number of dirty buffers (The deprecated initialization parameter FAST_START_IO_TARGET follows this low limit; that is, you cannot set FAST_START_IO_TARGET below 1000). The ESTIMATED_MTTR field contains the estimated number of log blocks generated since the last checkpoint. Because the database has just opened, the system contains few dirty buffers. That is why ESTIMATED_MTTR can be lower than the minimum possible TARGET_MTTR.

Now assume that you use the following statement to modify FAST_START_MTTR_TARGET:

ALTER SYSTEM SET FAST_START_MTTR_TARGET = 30;

Reissue the query to V$INSTANCE_RECOVERY, and Oracle responds with the following:

TARGET_MTTR ESTIMATED_MTTR CKPT_BLOCK_WRITES
30          15             0

The ESTIMATED_MTTR field is still 15 seconds, which means that the estimated MTTR at the current time (should a crash happen immediately) is still 15 seconds. This is because no new redo is written, and no data block has become dirty.

Assume that you make tremendous updates to the database and query V$INSTANCE_RECOVERY immediately afterwards. Oracle responds with the following:

TARGET_MTTR ESTIMATED_MTTR CKPT_BLOCK_WRITES
30          36             54367

You see that the effective MTTR target is 30 seconds. The estimated MTTR at the current time (should a crash happen immediately) is 36 seconds. This is fine. This means that some checkpoints writes might not have finished yet, so the buffer cache contains more dirty buffers than targeted.

Assume that you wait for one minute and reissue the query to V$INSTANCE_RECOVERY. Oracle responds with the following:

TARGET_MTTR ESTIMATED_MTTR CKPT_BLOCK_WRITES
30          31             55230

The estimated MTTR at this time has dropped to 31 seconds. This is because more dirty buffers have been written out during this period. This is shown by the increase of CKPT_BLOCK_WRITES field of V$INSTANCE_RECOVERY.


Note:

The number of physical writes minus the number of physical writes non checkpoint (from V$SYSSTAT) equals the field CKPT_BLOCK_WRITES in V$INSTANCE_RECOVERY


Calculating Performance Overhead

To calculate performance overhead, use the V$SYSSTAT view. For example, assume that you execute the following query:

SELECT NAME, VALUE 
FROM V$SYSSTAT
WHERE NAME IN ('physical reads','physical writes',
               'physical writes non checkpoint');

Oracle responds with the following:

NAME                                 VALUE
physical reads                        2376
physical writes                      14932
physical writes non checkpoint       11165

The first row shows the number of data blocks retrieved from disk. The second row shows the number of data blocks written to disk. The last row shows the number of writes to disk that would occur if you turned off checkpointing.

Use this data to calculate the overhead imposed by setting the FAST_START_MTTR_TARGET initialization parameter. To effectively measure the percentage of extra writes, mark the values for these statistics at different times, t_1 and t_2. Use the following formula where the variables stand for the following:

Variable  Definition 
*_1
 

Value of prefixed variable at time t_1, which is any time after the database has been running for a while 

*_2
 

Value of prefixed variable at time t_2, which is later than t_1 and not immediately after changing any of the checkpoint parameters 

PWNC
 

physical writes non checkpoint 

PW
 

physical writes 

PR
 

physical reads 

EIO
 

Percentage of estimated extra I/Os generated by enabling checkpointing 

Calculate the percentage of extra I/Os generated by fast-start checkpointing using this formula:

[((PW_2 - PW_1) - (PWNC_2 - PWNC_1)) / ((PR_2 - PR_1) + (PW_2 - PW_1))] x 100% = EIO

It can take some time for database statistics to stabilize after instance startup or dynamic initialization parameter modification. After such events, wait until all blocks age out of the buffer cache at least once before taking measurements. If the percentage of extra I/Os is too high, then increase the value of FAST_START_MTTR_TARGET.

The number of extra writes caused by setting FAST_START_MTTR_TARGET to a nonzero value is application-dependent; it is not dependent on cache size.

Calculating Performance Overhead: Example Scenario

As an example, assume the initialization parameter setting is as follows:

FAST_START_MTTR_TARGET = 90  # 90 seconds

After the statistics stabilize, you issue this query on V$SYSSTAT:

SELECT NAME, VALUE 
FROM V$SYSSTAT
WHERE NAME IN ('physical reads','physical writes',
               'physical writes non checkpoint');

Oracle responds with the following:

Name                               Value
physical reads                      2376
physical writes                    14932
physical writes non checkpoint     11165

The physical write checkpoint statistics can also be found in the CKPT_BLOCK_WRITES field of the V$INSTANCE_RECOVERY view. For example:

SELECT CKPT_BLOCK_WRITES 
FROM V$INSTANCE_RECOVERY;

Oracle responds with the following:

CKPT_BLOCK_WRITES    3767

It is consistent with the result from V$SYSSTAT: 3767 = 14932 - 11165.

After making updates for a few hours, you reissue the query. Oracle responds with the following:

Name                               Value
physical reads                      3011
physical writes                    17467
physical writes non checkpoint     13231

Substitute the values from the SELECT statements in the formula described above to determine how much performance overhead you are incurring:

[((17467 - 14932) - (13231 - 11165)) / ((3011 - 2376) + (17467 - 14932))] x 100% = 14.8%

As the result indicates, enabling fast-start checkpointing generates about 15% more I/O than required had you not enabled fast-start checkpointing. After calculating the extra I/O, you decide you can afford more system overhead if you decrease recovery time.

To decrease recovery time, reduce the value for the parameter FAST_START_MTTR_TARGET to 60. After items in the buffer cache age out, calculate V$SYSSTAT statistics across a second interval to determine the new performance overhead. Query V$SYSSTAT:

SELECT NAME, VALUE FROM V$SYSSTAT
WHERE NAME IN ('physical reads', 'physical writes',
'physical writes non checkpoint');

Oracle responds with the following:

Name                               Value
physical reads                      4652
physical writes                    28864
physical writes non checkpoint     21784

After making updates, reissue the query. Oracle responds with the following:

Name                               Value
physical reads                      6000
physical writes                    35394
physical writes non checkpoint     26438

Calculate how much performance overhead you are incurring using the values from the two SELECT statements:

[(35394 - 28864) - (26438 - 21784)) / ((6000 - 4652) + (35394 - 28864))] x 100% = 23.8%

After changing the parameter, the percentage of I/Os performed by Oracle is now about 24% more than it would be if you disabled fast-start checkpointing.

Calibrating the MTTR

The FAST_START_MTTR_TARGET initialization parameter calculates internal system trigger values to limit the length of the redo log and the number of dirty data buffers in the data cache. This calculation uses estimated times to read a redo block and to read and write a data block.

Initially, internal defaults are used. These defaults are replaced by execution time estimates during system operation. However, the best values are obtained from measurements taken from an actual recovery from a failure. To effectively align FAST_START_MTTR_TARGET, perform several instance recoveries to ensure that the time to read a redo block and the time to read and write a data block are recorded accurately.

Before doing instance recoveries to calibrate the FAST_START_MTTR_TARGET, decide whether FAST_START_MTTR_TARGET is being calibrated for a database crash or a hardware crash. This is a consideration if your database files are stored in a file system or if your I/O subsystem has a memory cache, because there is a considerable difference in the read and write time to disk depending on whether or not the files are cached. The workload being run during the instance recovery should be a very good representation of the average workload on the system to ensure that the amount of redo records generated are similar.

Tuning Transaction Recovery

During the second phase of instance recovery, Oracle rolls back uncommitted transactions. Oracle uses two features, fast-start on-demand rollback and fast-start parallel rollback, to increase the efficiency of this recovery phase.


Note:

These features are part of fast-start fault recovery and are only available in the Oracle9i Enterprise Edition.  


This section contains the following topics:

Using Fast-Start On-Demand Rollback

Using the fast-start on-demand rollback feature, Oracle automatically allows new transactions to begin immediately after the cache recovery phase of recovery completes. If a user attempts to access a row that is locked by a dead transaction, Oracle rolls back only those changes necessary to complete the transaction; in other words, it rolls them back on demand. Consequently, new transactions do not have to wait until all parts of a long transaction are rolled back.


Note:

Oracle does this automatically. You do not need to set any parameters or issue statements to use this feature.  


Using Fast-Start Parallel Rollback

In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. Essentially, fast-start parallel rollback is to transaction recovery what parallel recovery is to cache recovery.

Fast-start parallel rollback is mainly useful when a system has transactions that run a long time before committing, especially parallel INSERT, UPDATE, and DELETE operations. SMON automatically decides when to begin parallel rollback and disperses the work among several parallel processes: process one rolls back one transaction, process two rolls back a second transaction, and so on.

One special form of fast-start parallel rollback is intra-transaction recovery. In intra-transaction recovery, a single transaction is divided among several processes. For example, assume eight transactions require recovery with one parallel process assigned to each transaction. The transactions are all similar in size except for transaction five, which is quite large. This means it takes longer for one process to roll this transaction back than for the other processes to roll back their transactions.

In this situation, Oracle automatically begins intra-transaction recovery by dispersing transaction five among the processes: process one takes one part, process two takes another part, and so on.

You control the number of processes involved in transaction recovery by setting the initialization parameter FAST_START_PARALLEL_ROLLBACK to one of three values:

FALSE

Turns off fast-start parallel rollback.

LOW

Specifies that the number of recovery servers cannot exceed twice the value of the CPU_COUNT initialization parameter.

HIGH

Specifies that the number of recovery servers cannot exceed four times the value of the CPU_COUNT initialization parameter.

Parallel Rollback in an Oracle Real Application Clusters Configuration

In Oracle Real Application Clusters, you can perform fast-start parallel rollback on each instance. Within each instance, you can perform parallel rollback on transactions that are:

After a rollback segment is online for a given instance, only this instance can perform parallel rollback on transactions on that segment.

Monitoring Progress of Fast-Start Parallel Rollback

Monitor the progress of fast-start parallel rollback by examining the V$FAST_START_SERVERS and V$FAST_START_TRANSACTIONS views. V$FAST_START_SERVERS provides information about all recovery processes performing fast-start parallel rollback. V$FAST_START_TRANSACTIONS contains data about the progress of the transactions.

See Also:

 

Go to previous page Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback