This chapter contains the following topics:
An RMAN backup or restore job can be divided into separate phases or components. The slowest of these phases in any RMAN job is called the bottleneck. The purpose of RMAN tuning is to identify the bottlenecks for a given job and use RMAN commands, initialization parameters, or adjustments to physical media to improve performance.
Tuning RMAN performance requires a detailed understanding of how RMAN creates a backup. As explained in "About RMAN Channels", the work of a backup is performed by one or more channels. A channel represents a stream of bytes to a storage device.
For the purposes of illustration, you can think of the byte stream as passing from the input buffers in memory through the CPU to the output buffers, and from there to the storage device. To direct a backup to two tape devices, you allocate two tape channels so that each byte stream goes to a different device.
The work of each channel, whether of type disk or System Backup Tape (SBT), is subdivided into the following distinct phases:
A channel reads blocks from disk into input I/O buffers.
A channel copies blocks from input buffers to output buffers and performs additional processing on the blocks.
A channel writes the blocks from output buffers to storage media. The write phase can take either of the following mutually exclusive forms, depending on the type of backup media:
Figure 23-1 depicts two channels backing up data stored on three disks. Each channel reads the data into the input buffers, processes the data while copying it from the input to the output buffers, and then writes the data from the output buffers to disk.
Figure 23-1 Phases of a Multichannel Backup to Disk
Figure 23-2 also depicts two channels backing up data stored on three disks, but one disk is mounted remotely over the network. Each channel reads the data into the input buffers, processes the data while copying it from the input buffers to the output buffers, and then writes the data from the output buffers to tape. Channel 1 writes the data to a locally attached tape drive, whereas channel 2 sends the data over the network to a remote media server.
Figure 23-2 Phases of a Multichannel Backup to Tape
When restoring data, a channel performs these steps in reverse order and reverses the reading and writing operations. The following sections explain RMAN tuning concepts in terms of a backup.
The number of channels available for use with a device determines whether RMAN can read from and write to this device in parallel. It is recommended that the number of channels be equal to the number of storage devices used. Therefore, when RMAN uses disk, the number of channels must be equal to the number of physical disks accessed. When RMAN uses tape, the number of channels must be equal to the number of tape drives accessed by RMAN.
This section explains factors that affect performance when an RMAN channel is reading data from disk:
During a backup, an RMAN channel reads the blocks from the input files into I/O disk buffers. The database files on the disk subsystem can be managed by either Automatic Storage Management (ASM) or an alternative volume manager or file system. The considerations for backup tuning change depending on whether you manage database files with ASM.
The allocation of the input buffers depends on how the files are multiplexed. Backup multiplexing is RMAN's ability to read several files in a backup simultaneously from different sources and then write them to a single backup piece. The level of multiplexing, which is the number of input files simultaneously read and then written into the same backup piece, is determined by the algorithm described in "About Multiplexed RMAN Backup Sets". Review this section before proceeding.
When an RMAN channel backs up files from disk, it uses the rules described in Table 23-1 to determine how large to make the input disk buffers.
Table 23-1 Data File Read Buffer Sizing Algorithm
|Level of Multiplexing||Input Disk Buffer Size|
Less than or equal to 4
The RMAN channel allocates 16 buffers of size 1 megabyte (MB) so that the total buffer size for all the input files is 16 MB.
Greater than 4 but less than or equal to 8
The RMAN channel allocates a variable number of disk buffers of size 512 kilobytes (KB) so that the total buffer size for all the input files is less than 16 MB.
Greater than 8
The RMAN channel allocates 4 disk buffers of 128 KB for each file, so that the total buffer size for each input file is 512 KB.
In the example shown in Figure 23-3, one channel is backing up four data files.
MAXOPENFILES is set to 4 and
FILESPERSET is set to 4. Thus, the level of multiplexing is 4. So, the total size of the buffers for each data file is 4 MB. The combined size of all the buffers is 16 MB.
Figure 23-3 Disk Buffer Allocation
If a channel is backing up files stored in ASM, then the number of input disk buffers equals the number of physical disks in the ASM disk group only if the level of multiplexing is 1. For example, if a data file is stored in an ASM disk group that contains 16 physical disks, then the channel allocates 16 input buffers for the data file backup.
If a channel is restoring a backup from disk, then 4 buffers are allocated. The size of the buffers is dependent on the operating system.
When a channel reads from or writes to disk, the I/O is either synchronous I/O or asynchronous I/O. When the disk I/O is synchronous, a server process can perform only one task at a time. When the disk I/O is asynchronous, a server process can begin an I/O operation and then perform other work while waiting for the I/O to complete. RMAN can also begin multiple I/O operations before waiting for the first to complete.
When reading from an ASM disk group, use asynchronous disk I/O if possible. Also, if a channel reads from a raw device managed with a volume manager, then asynchronous disk I/O also works well. Some operating systems support native asynchronous disk I/O. The database takes advantage of this feature if it is available.
On operating systems that do not support native asynchronous I/O, the database can simulate it with special I/O slave processes. These processes are dedicated to performing I/O on behalf of another process.
You can control disk I/O slaves by setting the
DBWR_IO_SLAVES initialization parameter, which is not dynamic. The parameter specifies the number of I/O server processes used by the database writer process (DBWR). By default, the value is 0 and I/O server processes are not used. If asynchronous I/O is disabled, then RMAN allocates four backup disk I/O slaves for any nonzero value of
When attempting to get shared buffers for I/O slaves, the database does the following:
LARGE_POOL_SIZE initialization parameter is set, and if the
DBWR_IO_SLAVES parameter is set to a nonzero value, then the database attempts to get memory from the large pool. If this value is not large enough, then an error is recorded in the alert log, the database does not try to get buffers from the shared pool, and asynchronous I/O is not used.
LARGE_POOL_SIZE initialization parameter is not set or is set to zero, then the database attempts to get memory from the shared pool.
If the database cannot get enough memory, then it obtains I/O buffer memory from the Program Global Area (PGA) and writes a message to the
log file indicating that synchronous I/O is used for this backup.
The memory from the large pool is used for many features, including the shared server, parallel query, and RMAN I/O slave buffers. Configuring the large pool prevents RMAN from competing with other subsystems for the same memory.
Requests for contiguous memory allocations from the shared pool are usually small (under 5 KB). However, a request for a large contiguous memory allocation can either fail or require significant memory housekeeping to release the required amount of contiguous memory. Although the shared pool may be unable to satisfy this memory request, the large pool can do so. The large pool does not have a least recently used (LRU) list; the database does not attempt to age memory out of the large pool.
CONFIGURE CHANNEL commands, the
RATE parameter specifies the bytes per second that are read on a channel. You can use this parameter to set an upper limit for bytes read so that RMAN does not consume excessive disk bandwidth and degrade online performance. Essentially,
RATE serves as a backup throttle. For example, if you set
RATE 1500K, and if each disk drive delivers 3 megabytes per second, then the channel leaves some disk bandwidth available to the online system.
In this phase, a channel copies blocks from the input buffers to the output buffers and performs additional processing. For example, if a channel reads data from disk and backs up to tape, then the channel copies the data from the disk buffers to the output tape buffers.
The copy phase involves the following types of processing:
When performing binary compression, RMAN applies a compression algorithm to the data in backup sets. Binary compression can be CPU-intensive. You can choose which compression algorithm RMAN uses for backups. The basic compression level for RMAN has a good compression ratio for most scenarios. If you enabled the Oracle Advanced Compression option, there are several different levels to choose from that provide tradeoffs between compression ratios and required CPU resources. Binary compression is explained in "About Binary Compression for RMAN Backup Sets" and in "Making Compressed Backups".
When performing backup encryption, RMAN encrypts backup sets by using an algorithm listed in
V$RMAN_ENCRYPTION_ALGORITHMS. RMAN offers three modes of encryption: transparent, password-protected, and dual-mode. Backup encryption is explained in "Encrypting RMAN Backups". Backup encryption can be CPU-intensive.
When backing up to SBT, RMAN gives the media management software a stream of bytes and associates a unique name with this stream. All details of how and where that stream is stored are handled entirely by the media manager. Thus, a backup to tape involves the interaction of both RMAN and the media manager.
Factors that affect the write phase for SBT are described in the following topics:
The RMAN-specific factors affecting the SBT write phase are analogous to the factors affecting disk reads. In both cases, the buffer allocation, slave processes, and synchronous or asynchronous I/O affect performance.
If you back up to or restore from an SBT device, then by default the database allocates four buffers for each channel for the tape writers (or reads if restoring data as shown in Figure 23-4). The size of the tape I/O buffers is platform-dependent. You can change this value with the
BLKSIZE parameters of the
ALLOCATE CHANNEL or
CONFIGURE CHANNEL command.
Figure 23-4 Allocation of Tape Buffers
RMAN allocates the tape buffers in the System Global Area (SGA) or the Program Global Area (PGA), depending on whether I/O slaves are used. If you set the initialization parameter
BACKUP_TAPE_IO_SLAVES=true, then RMAN allocates tape buffers from the SGA. Tape devices can only be accessed by one process at a time, so RMAN starts as many slaves as necessary for the number of tape devices. If the
LARGE_POOL_SIZE initialization parameter is also set, then RMAN allocates buffers from the large pool. If you set
BACKUP_TAPE_IO_SLAVES=false, then RMAN allocates the buffers from the PGA.
If you use I/O slaves, then set the
LARGE_POOL_SIZE initialization parameter to dedicate SGA memory to holding these large memory allocations. This parameter prevents RMAN I/O buffers from competing with the library cache for SGA memory. If I/O slaves for tape I/O were requested but there is not enough space in the SGA for them, slaves are not used, and a message appears in the alert log.
BACKUP_TAPE_IO_SLAVES specifies whether RMAN uses slave processes rather than the number of slave processes. Tape devices can only be accessed by one process at a time, and RMAN uses the number of slaves necessary for the number of tape devices.
When an SBT channel reads or writes data to tape, the I/O is always synchronous. For tape I/O, each channel allocated (whether manually or automatically) corresponds to a server process, called here a channel process.
Figure 23-5 shows synchronous I/O in a backup to tape.
Figure 23-5 Synchronous Tape I/O
The following steps occur:
The channel process composes a tape buffer.
The channel process executes media manager code that processes the tape buffer and internalizes it for further processing and storage by the media manager.
The media manager code returns a message to the server process stating that it has completed writing.
The channel process can initiate a new task.
Figure 23-6 shows asynchronous I/O in a tape backup. Asynchronous I/O to tape is simulated by using tape slaves. In this case, each allocated channel corresponds to a server process, which in the explanation that follows is identified as a channel process. For each channel process, one tape slave is started (or more than one, if multiple copies exist).
Figure 23-6 Asynchronous Tape I/O
The following steps occur:
A channel process writes blocks to a tape buffer.
The channel process sends a message to the tape slave process to process the tape buffer. The tape slave process executes media manager code that processes the tape buffer and internalizes it so that the media manager can process it.
While the tape slave process is writing, the channel process is free to read data from the data files and prepare more output buffers.
After the tape slave channel returns from the media manager code, it requests a new tape buffer, which usually is ready. Thus waiting time for the channel process is reduced, and the backup is completed faster.
The following factors affect the speed of the backup to tape:
If the tape device is remote, then the media manager must transfer data over the network. For example, an administrative domain in Oracle Secure Backup can contain multiple networked client hosts, media servers, and tape devices. If the database is on one host, but the output tape drive is attached to a different host, then Oracle Secure Backup manages the data transfer over the network. The network throughput is the upper limit for backup performance.
The tape native transfer rate is the speed of writing to a tape without compression. This speed represents the upper limit of the backup rate. The upper limit of your backup performance should be the aggregate transfer rate of all of your tape drives. If your backup is performing at that rate, and if it is not using an excessive amount of CPU, then RMAN performance tuning does not help.
The level of tape compression is very important for backup performance. If the tape has good compression, then the sustained backup rate is faster. For example, if the compression ratio is 2:1 and native transfer rate of the tape drive is 6 megabytes per second, then the resulting backup speed is 12 megabytes per second. In this case, RMAN must be able to read disks with a throughput of more than 12 megabytes per second or the disk becomes the bottleneck for the backup.
Do not use both tape compression provided by the media manager and binary compression provided by RMAN. If the media manager compression is efficient, then it is usually the better choice. Using RMAN-compressed backup sets can be an effective alternative to reduce bandwidth used to move uncompressed backup sets over a network to the media manager, if the CPU overhead required to compress the data in RMAN is acceptable.
Tape streaming during write operations has a major effect on tape backup performance. Many tape drives are fixed-speed, streaming tape drives. Because such drives can write data at only one speed, when they run out of data to write to tape, the tape must slow and stop. Typically, when the drive's buffer empties, the tape is moving so quickly that it actually overshoots; to continue writing, the drive must rewind the tape to locate the point where it stopped writing. Multiple speed tape drives are available that alleviate this problem.
The physical tape block size can affect backup performance. The block size is the amount of data written by media management software to a tape in one write operation. In general, the larger the tape block size, the faster the backup. The physical tape block size is not controlled by RMAN or Oracle database, but by media management software. See your media management software's documentation for details.
The principal factor affecting the write phase for disk is the buffer size. When the output of the backup resides on disk, each channel allocates four output buffers of 1 MB each. The disk channel writes the blocks to the disk subsystem. When restoring files, the read phase is similar to the write phase when backing up files, except the blocks move in the opposite direction.
If RMAN reads from a disk asynchronously, then it writes to the disk asynchronously. When writing to disk, you can make use of disk I/O slaves just as when reading from disk.
If RMAN is backing up files to a disk-based output destination striped over multiple disks, then you can allocate multiple channels. The number of channels is limited only to the number of disks over which the destination is striped. ASM is one example of a destination striped over multiple disks.
Typically, you begin the tuning process by using
V$ views to determine where RMAN backup and restore operations are encountering problems.
This section contains the following topics:
Detail rows describe the files being processed by one job step, whereas aggregate rows describe the files processed by all job steps in an RMAN command. A job step is the creation or restoration of one backup set or data file copy. Detail rows are updated with every buffer that is read or written during the backup step, so their granularity of update is small. Aggregate rows are updated when each job step completes, so their granularity of update is large.
Table 23-2 describes the columns in
V$SESSION_LONGOPS that are most relevant for RMAN. Typically, you view the detail rows rather than the aggregate rows to determine the progress of each backup set.
Table 23-2 Columns of V$SESSION_LONGOPS Relevant for RMAN
|Column||Description for Detail Rows|
The server session ID corresponding to an RMAN channel
The server session serial number. This value changes each time a server session is reused.
A text description of the row. Examples of details rows include
For backup output rows, this value is
The meaning of this column depends on the type of operation described by this row:
The meaning of this column depends on the type of operation described by this row:
Each server session performing a backup or restore job reports its progress compared to the total work required for a job step. For example, if you restore the database with two channels, and each channel has two backup sets to restore (a total of four sets), then each server session reports its progress through a single backup set. When a set is completely restored, RMAN begins reporting progress on the next set to restore.
To monitor RMAN job progress:
longops) containing the following SQL statement:
SELECT SID, SERIAL#, CONTEXT, SOFAR, TOTALWORK, ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE" FROM V$SESSION_LONGOPS WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND TOTALWORK != 0 AND SOFAR <> TOTALWORK;
RMAN> RESTORE DATABASE;
longopsscript to check the progress of the RMAN job. If you repeat the query while the RMAN job progresses, then you see output such as the following:
SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK %_COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 10377 36617 28.34 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 21513 36617 58.75 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 29641 36617 80.95 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 35849 36617 97.9 SQL> @longops no rows selected
longopsscript at intervals of 2 minutes or more and the
COMPLETEcolumn does not increase, then RMAN is encountering a problem. See "Monitoring RMAN Interaction with the Media Manager" to obtain more information.
If you frequently monitor the execution of long-running tasks, then you could create a shell script or batch file under your host operating system that runs SQL*Plus to execute this query repeatedly.
V$BACKUP_SYNC_IO contains rows when the I/O is synchronous to the process (or thread on some platforms) performing the backup.
V$BACKUP_ASYNC_IO contains rows when the I/O is asynchronous. Asynchronous I/O is obtained either with I/O processes or because it is supported by the underlying operating system.
The results of a backup or restore job remain in memory until the database instance shuts down. Thus, you can query the views after the job completes.
To determine whether the tape is streaming when the I/O is synchronous:
EFFECTIVE_BYTES_PER_SECONDcolumn in the
EFFECTIVE_BYTES_PER_SECOND is less than the raw capacity of the hardware, then the tape is not streaming. If
EFFECTIVE_BYTES_PER_SECOND is greater than the raw capacity of the hardware, the tape may or may not be streaming. Compression may cause the
EFFECTIVE_BYTES_PER_SECOND to be greater than the speed of real I/O.
Oracle Database Reference for more information about these views
With synchronous I/O, it is difficult to identify specific bottlenecks because all synchronous I/O is a bottleneck to the process. The only way to tune synchronous I/O is to compare the rate (in bytes per second) with the device's maximum throughput rate. If the rate is lower than the rate that the device specifies, then consider tuning this aspect of the backup and restore process.
To determine the rate of synchronous I/O:
DISCRETE_BYTES_PER_SECONDcolumn in the
V$BACKUP_SYNC_IOview to display the I/O rate.
If you see data in
V$BACKUP_SYNC_IO, then the problem is that you have not enabled asynchronous I/O or you are not using disk I/O slaves.
Long waits are the number of times the backup or restore process told the operating system to wait until an I/O was complete. Short waits are the number of times the backup or restore process made an operating system call to poll for I/O completion in a nonblocking mode. Ready indicates the number of times when I/O was ready for use, so there was no need to make an operating system call to poll for I/O completion.
To determine the rate of asynchronous I/O:
IO_COUNTcolumns in the
V$BACKUP_ASYNC_IOview to display the I/O rate.
The simplest way to identify the bottleneck is to find the data file that has the largest ratio for
LONG_WAITS divided by
IO_COUNT. For example, you can use the following query:
SELECT LONG_WAITS/IO_COUNT, FILENAME FROM V$BACKUP_ASYNC_IO WHERE LONG_WAITS/IO_COUNT > 0 ORDER BY LONG_WAITS/IO_COUNT DESC;
If you have synchronous I/O but you set
BACKUP_DISK_IO_SLAVES, then the I/O is displayed in
Oracle Database Reference for descriptions of the
Many factors can affect backup performance. Often, finding the solution to a slow backup is a process of trial and error. To obtain the best performance for a backup, follow the steps in this section in sequential order.
This section contains the following steps:
As explained in "RATE Channel Parameter", the
RATE parameter on a channel is intended to reduce, rather than increase, backup throughput so that more disk bandwidth is available for other database operations. If the backup is not streaming to tape, then confirm that the
RATE parameter is not set.
To remove the RATE parameter:
If the backup is in a
RUN command, then remove the
RATE parameter, if it is specified, from the
ALLOCATE command. Skip the remaining steps.
If the backup is not in a
RUN command, then start RMAN, connect to the target database, and proceed to the next step.
SHOW ALLcommand to show the currently configured settings.
RATEparameter, if it is set, from the
As explained in "Synchronous and Asynchronous Disk I/O", some operating systems support native asynchronous I/O. If and only if your disk does not support asynchronous I/O, then set
DBWR_IO_SLAVES. Any nonzero value for
DBWR_IO_SLAVES causes a fixed number of disk I/O slaves to be used for backup and restore, which simulates asynchronous I/O.
To enable disk I/O slaves:
DBWR_IO_SLAVESinitialization parameter to a nonzero value.
This setting enables the database writer processes to use slaves. Thus, you may need to increase the value of the
PROCESSES initialization parameter.
LARGE_POOL_SIZE initialization parameter if the database reports an error in the alert log stating that it does not have enough memory and that it cannot start I/O slaves. The message resembles the following:
ksfqxcre: failure to allocate shared memory means sync I/O will be used whenever async I/O to file not supported natively
The large pool is used for RMAN and for other purposes, so its total size must accommodate all uses. This is especially true if
DBWR_IO_SLAVES has been set and the DBWR process needs buffers.
To set the large pool size:
V$SGASTAT.POOLto determine in which pool (shared pool or large pool) the memory for an object resides.
LARGE_POOL_SIZEinitialization parameter in the target database.
You can execute an
ALTER SYSTEM SET statement to set the parameter dynamically. The formula for setting
LARGE_POOL_SIZE is as follows:
LARGE_POOL_SIZE = number_of_allocated_channels * (16 MB + ( 4 * size_of_tape_buffer ) )
There are several tasks that you can perform to identify and remedy bottlenecks that affect backup performance. This includes the following tasks:
One reliable way to determine whether the output device or input disk I/O is the bottleneck in a given backup job is to compare the time required to run backup tasks with the time required to run
BACKUP VALIDATE of the same tasks.
BACKUP VALIDATE of a backup performs the same disk reads as a real backup but performs no I/O to an output device.
To compare backup and validation times:
setenv NLS_LANG AMERICAN_AMERICA.WE8DEC; setenv NLS_DATE_FORMAT "MM/DD/YYYY HH24:MI:SS"
BACKUP VALIDATEcommand instead of the
Starting backup atand
Finished backup atmessages.
BACKUPcommand instead of the
Starting backup atand
Finished backup atmessages.
If the time for the
BACKUP VALIDATE to tape is about the same as the time for a real backup to tape, then reading from disk is the likely bottleneck. See "Tuning the Read Phase".
If the time for the
BACKUP VALIDATE to tape is significantly less than the time for a real backup to tape, then writing to the output device is the likely bottleneck. See "Tuning the Copy and Write Phases".
RMAN may not be able to send data blocks to the output device fast enough to keep it occupied. For example, during an incremental backup, RMAN only backs up blocks changed since a previous data file backup as part of the same strategy. If you do not turn on block change tracking, then RMAN must scan whole data files for changed blocks, and fill output buffers as it finds such blocks. If few blocks changed, and if RMAN is making an SBT backup, then RMAN may not fill output buffers fast enough to keep the tape drive streaming.
You can improve backup performance by adjusting the level of multiplexing, which is number of input files simultaneously read and then written into the same RMAN backup piece. The level of multiplexing is the minimum of the
MAXOPENFILES setting on the channel and the number of input files placed in each backup set. The following table makes recommendations for adjusting the level of multiplexing.
Table 23-3 Adjusting the Level of Multiplexing
Increase the level of multiplexing. Determine which is the minimum,
In this way, you increase the rate at which RMAN fills tape buffers, which makes it more likely that buffers are sent to the media manager fast enough to maintain streaming.
If the read phase is performing well, then the copy or write phases are probably the bottleneck. In particular, if RMAN is sending data blocks to the tape drive fast enough to support streaming, but the tape is not streaming, then the SBT write phase is the bottleneck. Try to improve performance as follows:
If the backup is a full backup, then consider using incremental backups.
Incremental level 1 backups write only the changed blocks from data files to tape, so that any bottleneck on writing to tape has less impact on your overall backup strategy. In particular, if tape drives are not locally attached to the node of the database being backed up, then incremental backups can be faster. See "Making and Updating RMAN Incremental Backups".
If the backup uses the basic compression algorithm, then consider using the Oracle Advanced Compression option.
If the database host uses multiple CPUs, and if the backup uses binary compression, then increase the number of channels.
If the backup is encrypted, then change the encryption algorithm to
AES128 algorithm is the least CPU-intensive algorithm. See "Configuring the Backup Encryption Algorithm".
If RMAN is backing up to tape, then try the following adjustments:
Adjust the size of the tape I/O buffers.
BLKSIZE parameters of the
ALLOCATE CHANNEL or
CONFIGURE CHANNEL command to set the size. The size of the tape I/O buffers is platform-dependent. The
BLKSIZE setting overrides the default.
Adjust settings in the media management software.
Some media manager settings, including the tape block size, may affect backup performance.
If RMAN is backing up files to ASM, then increase the number of channels.
For example, if RMAN is backing up the database to a single disk group with 16 physical disks, then allocate or configure at least 4 disk channels, up to a maximum of 16.