|Oracle7 Enterprise Backup Utility Administrator's Guide||
This chapter describes the performance-enhancing features available in the Enterprise Backup Utility.
The topics covered in this chapter are:
When more than one tape device is available for EBU usage, the first thing to do to increase performance is to use all available tape devices.
The parallel specifier can be used to signal EBU to generate in parallel more than one logical tape stream. Depending on the configuration of the Media Management Software, each logical tape stream can cause I/O either to a physical tape device or, under some circumstances, it is possible that more than one logical tape stream be multiplexed by the Media Vendor Software into a single physical tape device. This is called Hardware Multiplexing.
In general, the backup time will be decreased when more than one logical tape stream is multiplexed into a physical tape device (up to the number of streams that completely saturate the tape device), but the restore time might be incremented considerably and made problematic. Before using the hardware multiplexing feature, be sure to understand the implications for your media management software.
Once all tape devices are being used by EBU, the key to EBU's performance is that these devices are kept streaming. This means that the tapes should not wait for data from the disk.
There are two options available in EBU to keep the tapes streaming:
By default, each target database datafile is backed up in its own Backup File Set (BFS). It may be possible to improve backup performance by interlacing database blocks from database files stored on different physical disk drives into a single BFS. This practice is called multiplexing. Up to 32 datafiles files (128 archivelog files) can be multiplexed into a single BFS. This is not the same as hardware multiplexing described above.
EBU's multiplexing effectively increases disk access speed by allowing data from multiple disks to be read into a single BFS simultaneously. Data blocks from the multiple datafiles are integrated into one logical tape stream. The Enterprise Backup Utility automatically demultiplexes the BFS upon restore.
To use the multiplexing feature of the Enterprise Backup Utility, use the mux specifier in your command script. By default, files are backed up without multiplexing.
Use multiplexing only if the media backup device is so fast that it is not streaming continuously in the default (no mulitplexing) case. Multiplexing to an already fully-utilized media device not only fails to improve performance, but may even degrade it.
Figure 6-1, "Backing Up a Tablespace without Multiplexing" depicts a no-multiplex backup of a tablespace with four datafiles, stored across four physical disks. Each datafile is backed up to tape in its own BFS, and the BFSs are written to tape sequentially.
Figure 6-2, "Backing Up a Tablespace with Multiplexing" depicts backing up the same tablespace backup as in Figure 6-1, except with multiplexing. The datafiles are multiplexed in pairs, and each pair is written to a single BFS. Disk access speed is effectively doubled, allowing it to more closely match the speed of the backup media device.
# tablespace A contains "?/dbs/a[1-4].dbf" backup online oracle_sid = "PROD" control_file tablespace = "A" mux = ("home/oracle/dbs/a1.dbf", "/home/oracle/dbs/a2.dbf"), ("/home/oracle/dbs/a3.dbf", "/home/oracle/dbs/a4.dbf")
The mux specifier is used only to specify multiplexing of files identified by the database or tablespace or dbfile specifiers. It is not a substitute for the database or tablespace or dbfile specifiers. For example, if you do not specify tablespace="A" in the preceding example, an error results because the files ?/dbs/a[1-4].dbf are not in the backup set.
It is not possible to multiplex control files or parameter files, only datafiles and archivelogs can be multiplexed. Archivelog multiplexing is described later.
Multiplexing files that reside in the same physical disk normally does not yield any advantage as the disk throughput will be diminished by the head movement required to read files in the same disk at the same time.
Multiplexing datafiles that exist on different disks is the most important criteria for multiplexing. The multiplexing goal is to spin as many disks as possible so that the overall transfer disk rate is such that the tape is kept streaming. This depends on several factors:
In this case, 4 or 5 disks per controller can be multiplexed to maximize throughput. Using no multiplex will yield only 4 Mb vs. 10 Mb/s. Using 2 disks will be 7 Mb/s vs. 10 Mb/s. Using 3 disks will be 9 Mb/s versus 10 Mb/s with multiplexing. Using 4 or 5 disks - 11.2 and 10.5 Mb/s are obtained. Using all 6 disks, the throughput drops to 9Mb/s due to controller saturation. In the ideal case where no datafiles have empty blocks, 5 disks per multiplex stream yields the best result.
Initially, It would seem that this is the same problem as Scenario 1, but it is not. EBU will read the blocks in the datafiles, and if all the blocks are empty, it will discard them as they are not needed to be backed up.
Due to this "block discarding", the best would be to use 4 disks per multiplex stream. In this case, we want to have a margin so that, when chunks of empty blocks are discarded by the disk readers, other files can keep the tape streaming.
When dealing with sparse files, EBU still needs to read and analyze all the empty blocks from disk before discarding them. If the number of emply chunks is large, the backup will be characterized by lots of disk activity and very little tape activity.
When there is only one file and no multiplexing, the tape will be waiting for data to write to tape while the disk is skipping the empty blocks. When multiple files are multiplexed, the tape may or may not be idle, depending on the location of the empty blocks. If for all datafiles the empty blocks are at the end, and the size of the empty blocks are different, then the overall perception will be that the tape throughput decays as the backup goes along. Initially a 10 MB rate is observed which starts to decrease as the disk operations do not find data in the files until there is no more tape activity while the empty blocks of all the files are read. How long that idle phase is will depend on the percentage of empty blocks.
When compression is added into the mix, it is hard to predict if the tapes will be kept streaming as some data is more compressible than other. Typically a tape drive will quote a rate for compressed data based on a 2 to 1 compression ratio. It is possible to have data with compression rates of 10 to 1 or more, while other data might not get to the 2 to 1 ratio.
Once compression is taken into account, the actual disk transfer rates might not be enough to keep the tapes busy. In general if we expect the data to be compressible N times, then the disk transfer rate should be N times the tape transfer rate.
From the compression point of view, the ideal mix is highly compressible files mixed with files with a lower compression ratio. In that way, the overall mix is a medium compression. Otherwise, some tapes will not be kept busy due to highly compressible data, while others have too much disk power assigned and the tapes cannot keep up.
When using disk striping, EBU multiplexing is not necessarily needed to utilize all the underlaying physical disks, as long as the tape_io_size is set so that all disks in the stripe are accessed during one disk read. For example for a four disk stripe with a stripe of size 32K, the tape_io_size should be set to 128K. Using the aforementioned tape_io_size for a striped disk volume is similar to having specified four disks in a mux clause to EBU, but in this case, with the appropriate tape_io_size, it is the disk striping mechanism that provides full physical disk utilization, instead of the Asynch IO module of EBU.
In order for stripping to be succesful from the EBU point of view, the stripes should encompass the whole physical device. Otherwise, if using multiple tape streams and reading/writing two or more stripes split into a physical device is the same as trying to multiplex files in the same physical device, the overhead of the head movemente may negate any advantage of spinning several disks in a single read/write operation
EBU's multiplexing may or may not increase the time needed to restore; this depends on what is being restored. The worst case is when only one file out of the N muxed files is restored. Then EBU will still need to read all the BFS, which is N times larger (assuming all files are the same length), to retrieve only one file. When all the files that were muxed together are restored together, the restore speed should match the backup speed, e.g., if 4 files of the same size were multiplexed during backup and only 1 is restored, EBU must still read all the BFS completely (thus causing the restore time to be 4 times as long as it could have possibly been if the restored file had been backed up without multiplexing).
Furthermore, if multiplexing is used, the restore time may increase when the restore involves files that were backed up to the same tape with different multiplexing combinations - in such a way that the restore requires two or more BFSs from the same tape at the same time. In this case, the other logical tape streams will have to wait until the first stream finishes before continuing, thus serializing the restore.
Archivelogs are not multiplexed in parallel, but serially. As discussed above, parallel multiplexing of files from the same disk does not yield any advantage. On the other hand, in general the number of archivelogs tends to be large compared to the number of datafiles and control files. Additionally, the archivelogs tend to be smaller than the datafiles. The overhead of creating a BFS for each archivelog is too high; thus, by default the archivelogs are serially multiplexed into a BFS, the default is to multiplex 32 archivelogs per BFS. The number of archivelogs per BFS is controlled by the arch_per_bfs specifier of the backup command. The maximum number of archivelogs that can be serially multiplexed in a single BFS is 128.
In order to obtain the sustained disk transfer rates assumed in the previous sections, it is necessary to keep the disks at maximum throughput. To that purpose, EBU uses asynchronous I/O to disk. This is implemented by using an I/O model. There are two basic I/O models that can be used: Asynchronous I/O and Shared Memory. Depending on the OS, both or only one of these I/O models will be available in a platform.
Under the Asynchronous I/O (AIO) model, EBU used the Asynch IO OS capabilities to issue Asynchronous operations to disk. Some OS's have AIO operations, but they are not supported for files residing in a file system. Thus, depending on the database configuration and the operations being performed, the AIO model may or may not be available. The AIO model only starts as many brio processes as the paralled setting requires and does not require IPC resources.
In platforms that support both models and AIO is not the default, the use_io_model=AIO can be used to direct EBU to use the AIO model.
The Shared Memory model requires multiple coordinating brio processes. Each brio needs to allocate a shared memory segment (of buffer_size bytes), a semaphore set (with N semaphores, where N is the number of files in the BFS) and M slave processes, where M depends on the calculation detailed below.
In platforms that support both models and the shared memory model is not the default, the use_io_model=SHM can be used to direct EBU to use the Shared momory model.
If the platform only supports one model, the use of the use_io_model specifier to specify a model will cause an error.
Both of these models are tuned by controlling the number of outstanding requests to each data file. A single I/O from a datafile might not be enough to keep the tape streaming.
The number of outstanding requests is derived from the EBU IO parameters.
In order to obtain the maximum disk performance, the number of outstanding requests should be such that the disk transfer rate is maximized. All disk I/O is done in tape_io_size Oracle blocks. The number of outstanding requests is derived from the following formulae:
#requests=(MAX(1, disk_io_size/tape_io_size)) *NTHREADS
NFILES is the number of files that are included in each Backup File Set (BFS). By default, each target database datafile is backup up in its own BFS (except archivelogs). By default, the value of NFILES is 1.
For example, with the default values:
disk_io_size = 16 tape_io_size = 32 buffer_size = 128 FILES =1
The number of disk requests would be:
MAX_IO = MAX(16,32) = 32 NTHREADS = MAX((128/32)/(1), 1) = MAX(4/1, 1) = MAX (4, 1) = 4 #requests = (max(1,integer(16/32)))*4 = (max(1, 0)))*4 = 1*4 = 4
For the following values:
disk_io_size= 128 tape_io_size= 32 buffer_size = 1024 NFILES = 4
The number of requests would be:
MAX_IO = MAX(128, 32) = 128 NTHREADS = MAX((1024/128)/(4),1) = MAX((8)/(4),1) = MAX(2, 1) = 2 #requests = (max(1,integer(128/32)))*2 = max(1, 4)))*2 = (4)*2= 8
In case of platforms using the shared memory model, the number of slave processes spawned by each logical tape stream is determined by:
#ofSlaves = NTHREADS*NFILES
In the last example, if using the shared memory model, each parallel stream would have 2*4=8 slave disk processes, plus a coordinator, for a total of 9 processes per each parallel stream.
Each one will require a shared memory buffer of size:
MAX(1, DISK_IO_SIZE/TAPE_IO_SIZE)) * NTHREADS * NFILES
Again, using the above example:
MAX(1, 128/32) * 2 * 4 = MAX(1,4) * 2 * 4 = 4 * 2 * 4 = 32 Oracle blocks
By using the parallel and mux specifiers and I/O parameters described above, there is normally no need to start multiple jobs for the same database in parallel. The granularity at the database level is the tablespace, no two or more jobs can be operating in the same tablespace for a given database.
Multiple databases can be backed up or restored simultaneously by involing multiple EBU processses.
Each backup or restore operation has its own EBU control process and BRIO processes, but all EBU instances should access the same Backup Catalog and media management software. On a given host, concurrent instances of EBU (in Unix running with the same userid) share an Instance Manager (brd process). Instances running (in Unix under different userid's or) on separate hosts will have their own Instance Managers.
Copyright © 1997 Oracle Corporation.
All Rights Reserved.