The cloud-scale Zero Data Loss Recovery Appliance, commonly known as Recovery Appliance, is an Engineered System designed to dramatically reduce data loss and backup overhead for all Oracle databases in the enterprise. Integrated with Recovery Manager (RMAN), the Recovery Appliance enables a centralized, incremental-forever backup strategy for large numbers of databases, using cloud-scale, fault-tolerant hardware and storage. The Recovery Appliance continuously validates backups for recoverability.
This chapter contains the following topics:
All production Oracle databases require data protection. Oracle provides RMAN as its preferred backup solution. Most enterprises have adopted one or more of the database backup strategies described in this section:
One popular approach, shown in Figure 1-1, is to use RMAN to take a weekly full backup, and then daily incremental backups. To improve incremental backup performance, Oracle recommends enabling block change tracking. These backups occur when activity on the database is lowest.
Figure 1-1 Full and Incremental Backups to Tape
An advantage of this technique is that backup windows, which affect the production server, are relatively brief on the days when incremental backups occur. A disadvantage is that when the database is continuously active, as when serving multiple global time zones, no easily accommodating backup window is available.
One solution is to set up Oracle Data Guard, and then back up the standby database, thereby removing the backup load from the production server. However, protecting all databases with Oracle Data Guard is often impractical.
The RMAN technique shown in Figure 1-2 makes daily incremental backups, and then uses the RECOVER COPY
command to merge the incremental changes into the full database copy. In this way, the database copy on disk is "rolled forward" every day.
Figure 1-2 RECOVER COPY on Disk, and Backup to Tape
This technique has the following advantages:
Only one initial full backup is required, which reduces the total weekly backup window time.
An RMAN SWITCH
command can point the control file to the database copy, which turns the copy into an actual database file, and thus eliminates the RESTORE
step.
Some disadvantages are as follows:
You must have sufficient disk space to keep a copy of the whole database on disk, and the archived redo log files required to recover it.
Only one physical copy of the database exists. You select the point in time at which to keep the copy, so you can recover to subsequent points in time. For example, to restore to any point in time within the past week, your physical copy must be older than SYSDATE-7
. The disadvantages are:
You cannot recover to a time earlier than the time at which you maintain the database copy.
The closer your recovery point in time is to the current time, the more incremental backups you must restore and apply to the copy. This technique adds time to the overall recovery time objective.
The database copy cannot be compressed or encrypted.
As an alternative to RMAN incremental backups and tape drives, some customers use third-party deduplicating appliances to process backup streams. Figure 1-3 depicts three databases writing to a centralized third-party appliance.
Figure 1-3 Third-Party Deduplicating Appliance
This technique has the following advantages:
A central backup location serves all databases in the environment.
The third-party software searches for patterns at the byte and sub-byte level to eliminate redundant data from backup to backup. For example, if a full database backup is almost identical to the backup taken a week before, then the software can attempt to prune the redundant bits from the incoming backup stream.
To reduce network load, one optional technique utilizes source-side deduplication so that backup streams are deduplicated on the database host instead of the third-party appliance. Typically, this technique relies on an RMAN SBT plug-in.
Some disadvantages are as follows:
These third-party appliances do not recognize or validate Oracle Database blocks. From the perspective of the appliance, a database backup is the same as a file system backup: a stream of bytes.
Deduplication is only effective for full database backups that have a high degree of redundancy. Strategies that use incremental backups often do not achieve good deduplication ratios.
The third-party appliance dictates which Oracle Database features to use rather than the other way around. Often, adapting to the requirements of the appliance means rewriting existing backup scripts.
A third-party storage snapshot is a set of pointers to storage blocks (not Oracle blocks) that existed when the snapshot was created. The virtual copies reside on the same storage array as the original data. Figure 1-4 depicts a copy-on-write snapshot, which is a type of third-party snapshot. After a snapshot is taken, when the first change to a storage block occurs, the array copies the before-image block to a new location on disk (C
) and writes the new block (C'
) to the original location.
Figure 1-4 Third-Party Copy-on-Write Snapshot
This technique has the following advantages:
An initial copy of the database is not necessary because snapshots are not stored as physical copies of blocks. Thus, less storage is consumed than in RMAN strategies.
Snapshots can be extremely fast. You put the database in backup mode (unless storage does not meet the requirements for snapshot storage optimization), and then take the snapshot. The snapshot needs to store physical blocks only when the blocks change, so a backup of an unchanged file is a metadata-only operation.
Snapshots use storage efficiently. A backup of a file with a single changed block requires only one additional version of the block to be stored—either the old version or new version of the block, depending on the snapshot technique.
Some disadvantages are as follows:
Snapshots have no knowledge of an Oracle Database block structure, and thus cannot validate Oracle blocks.
Because snapshots reside on the same storage array as the source database, they are vulnerable to storage failures and data corruptions. If the array is inaccessible, or if the storage contains data block corruptions, then the snapshots cannot be used for recovery.
Restoring a snapshot in place voids all snapshots that were taken after it unless the snapshot is fully restored to an alternate location.
See Also:
Oracle Database Backup and Recovery User's Guide to learn more about using Storage Snapshot Optimization to take third-party snapshots of the database
The role of information technology in the modern business is going through a tremendous transformation. The key drivers for this transformation are:
Data growth
Many organizations continue to experience exponential growth, which creates a greater challenge for efficient data management and protection. What works well for dozens of databases may not work well for hundreds or thousands of databases, often running on different platforms and on multiple physical servers.
Real-time analytics
Organizations are increasingly dependent on data analysis for critical real-time decisions. This dependency increases the pressure to maintain data integrity and prevent data loss.
Continuous global availability
Many databases provide 24/7 access across multiple time zones, which means that databases are continuously active.
The protection strategies described in "Traditional Database Backup Techniques" are not designed to solve the challenges created by this transformation. Enterprises find themselves without a consistent backup and recovery strategy. The following shortcomings are common to most or all of the traditional backup techniques:
Data loss exposure
A database is only recoverable to its last valid backup, which may have occurred hours or days ago. In addition, storage snapshots and third-party appliances cannot validate Oracle data blocks, and so cannot detect Oracle block-level corruptions.
Long backup windows
As database sizes increase, the lengths of the backup windows also increase, creating additional load on production systems. Critical databases cannot afford to be deprived of resources used for daily backups and related maintenance activities.
Lack of backup validation
Because most third-party backup snapshot and Recovery Appliances lack Oracle integrated data block and database backup validation, restore and recovery operations tend to fail. Such failures result in extended downtime and potentially larger data loss.
Lack of end-to-end visibility
As the number of databases increases exponentially, so the ease of manageability decreases. Backup scripts proliferate and change. New DBAs may struggle to understand what the legacy scripts do. Questions about the status, backup location, and recovery point objective (RPO) of a particular database become harder to answer.
The traditional techniques fail to provide a comprehensive and efficient Oracle-integrated data protection solution that meets the demands of a large-scale, enterprise Oracle environment. A new approach is required.
Recovery Appliance is a cloud-scale Engineered System designed to protect all Oracle databases across the enterprise. Most database backup and restore processing is performed by the centralized Recovery Appliance, making storage utilization, performance, and manageability of backups more efficient.
The Recovery Appliance stores and manages backups of multiple Oracle databases in a unified disk pool, using an RMAN incremental-forever strategy. The Recovery Appliance continually compresses, deduplicates, and validates backups at the database block level, while creating virtual full backups on demand.
A virtual full backup is a complete database image as of one distinct point in time, maintained efficiently through Recovery Appliance indexing of incremental backups from protected databases. A virtual full backup can correspond to any incremental backup that was received.
Figure 1-5 shows an overview of a sample Recovery Appliance environment.
Figure 1-5 Recovery Appliance Environment
As shown in Figure 1-5, a protected database is a client database that backs up data to a Recovery Appliance. Each protected database uses the Zero Data Loss Recovery Appliance Backup Module (Recovery Appliance Backup Module) for its backups. This module is an Oracle-supplied SBT library that RMAN uses to transfer backup data over the network to the Recovery Appliance.
The Recovery Appliance metadata database, which resides on each Recovery Appliance, manages metadata stored in the RMAN recovery catalog, and backups located in the Recovery Appliance storage location. The catalog is required to be used by all protected databases that send backups to Recovery Appliance.
Note:
Databases may use Recovery Appliance as their recovery catalog without also using it as a backup repository.
Administrators use Oracle Enterprise Manager Cloud Control (Cloud Control) to manage and monitor the environment. Cloud Control provides a "single pane of glass" view of the entire backup lifecycle for each database, whether backups reside on disk, tape, or another Recovery Appliance.
Recovery Appliance provides the following benefits:
See Also:
The Recovery Appliance uses various mechanisms to protect against different types of data loss, including physical block corruption. This section contains the following topics:
In traditional backup approaches, if the online redo log is lost, then media recovery loses all changes after the most recent available archived redo log file or incremental backup. A recovery point objective (RPO) of a day or more that might result from a traditional approach may be unacceptable.
Recovery Appliance solves the RPO problem through a continuous transfer of redo changes to the appliance from a protected database. This operation is known as real-time redo transport. Using delta push, the Recovery Appliance is a remote destination for asynchronous redo transport services from Oracle Database 11g and Oracle Database 12c databases. See "Delta Push" for more information.
Note:
This technology is based on the real-time redo transport algorithms of Oracle Data Guard. To avoid degrading the performance of the protected database, protected databases transfer redo asynchronously to the Recovery Appliance. If a protected database is lost, zero to subsecond data loss is expected in most cases.
See Also:
"Real-Time Redo Transport" to learn more about real-time redo transport
Oracle Data Guard Concepts and Administration for information about Oracle Data Guard redo transport
To protect against server or site outage, one Recovery Appliance can replicate backups to a different Recovery Appliance. Figure 1-6 shows the simplest form of replication, called one-way Recovery Appliance replication, in which an upstream Recovery Appliance (backup sender) transfers backups to a downstream Recovery Appliance (backup receiver).
In Figure 1-6, a protected database sends an incremental backup to the Recovery Appliance, which then queues it for replicating to the downstream Recovery Appliance. When the upstream Recovery Appliance sends the incremental backup to the downstream Recovery Appliance, it creates a virtual full backup as normal. The downstream Recovery Appliance creates backup records in its recovery catalog. When the upstream Recovery Appliance requests the records, the downstream Recovery Appliance propagates the records back.
If the local Recovery Appliance cannot satisfy virtual full backup requests, then it automatically forwards them to the downstream Recovery Appliance, which sends virtual full backups to the protected database. DBAs use RMAN as normal, without needing to understand where or how the backup sets are stored.
See Also:
A robust backup strategy protects data against intentional attacks, unintentional user errors (such as file deletions), and software or hardware malfunctions. Tape libraries provide effective protection against these possibilities.
Figure 1-7 show the traditional technique for tape backups, with a media manager installed on each host.
Figure 1-7 Backups to Tape Without Using Recovery Appliance
Figure 1-8 shows the Recovery Appliance technique for tape backups. The fundamental difference in the two approaches is that the Recovery Appliance backs up to tape, not the protected databases. The Recovery Appliance comes with preinstalled Oracle Secure Backup software, and supports optional Fibre Channel cards. Thus, installation of a media manager is not necessary on the protected database hosts.
Figure 1-8 Backups to Tape Using Recovery Appliance
When Recovery Appliance executes a copy-to-tape job for a virtual full backup, it constructs the physical backup sets, and copies them to tape, and then writes the metadata to the recovery catalog. If desired, the Recovery Appliance can also copy successive incremental backups and archived redo log file backups to tape. Whereas the backup on the Recovery Appliance is virtual, the backup on tape is a non-virtual, full physical backup. The Recovery Appliance automatically handles requests to restore backups from tape, with no need for administrator intervention.
The advantages of the Recovery Appliance tape solution are as follows:
The Recovery Appliance performs all tape copy operations automatically, with no performance load on the protected database host.
Tape backups are optimized. Recovery Appliance intelligently gathers the necessary blocks to create a non-virtual, full backup for tape.
Oracle Secure Backup is preinstalled, eliminating the need for costly third-party media managers.
Note:
You may deploy tape backup agents from third-party vendors on the Recovery Appliance for integration with existing tape backup software and processes. In this configuration, the agents must connect to their specialized media servers, which must be deployed externally to the Recovery Appliance.
Tape drives and tape libraries function more efficiently because Recovery Appliance is a single large centralized system with complete control over them. In other tape solutions, hundreds or thousands of databases can contend for tape resources in an uncoordinated manner.
A basic principle of backup and recovery is to ensure that backups can be restored successfully. To ensure that there are no physical corruptions within the backed-up data blocks, backups require regular validation. Validation typically involves running an RMAN RESTORE VALIDATE job regularly, along with running periodic full restore and recovery operations to a separate machine.
Recovery Appliance provides end-to-end block validation, which occurs in the following stages of the workflow:
The Recovery Appliance automatically validates the backup stream during the backup ingest phase, before writing the backups to disk. The Recovery Appliance also validates the backup before sending it back to the original or alternate database server during the restore phase. Therefore, no manual RESTORE VALIDATE
step is required.
In addition, a background task running on the Recovery Appliance periodically validates the integrity of the virtual full backups in the delta pools (see "Delta Pools"). The goal of this task is to check each block of each virtual full backup of each protected database and to work behind the scenes when minimal activity is occurring. By default, the validation task runs every 14 days following the last completed validation of a database’s current set of backups on disk.
Just as with data file backups, the Recovery Appliance validates the integrity of redo log blocks during every operation, including receiving redo from the protected database, and storing it in compressed archived log backup sets.
Oracle Automatic Storage Management (ASM)
Oracle ASM stores the backup and redo data for the Recovery Appliance. Oracle ASM mirrored copies provide redundancy (see "Recovery Appliance Storage Locations").
If a corrupted block is read on the primary mirror, the Recovery Appliance automatically repairs the block from the mirrored copy. This mechanism resolves most isolated block corruption cases.
Tape library
Recovery Appliance validates blocks when it copies them to tape, and also when it restores them from tape (see "Tape Archival").
Downstream Recovery Appliance in a replication configuration
If you configure replication, then the downstream Recovery Appliance validates data during the backup ingest and restore phases (see "How a Downstream Recovery Appliance Processes Backups").
None of the preceding backup validation processes occur on the production database hosts, thus freeing production resources for more critical operational workloads.
Note:
Oracle Maximum Availability Architecture best practices recommend that you still perform periodic full database recovery tests to verify operational practices and to detect issues that might occur only during media recovery.
See Also:
"CONFIG" for information about the validate_db_days
configuration parameter
"RA_DATABASE" for information about the RA_DATABASE.LAST_VALIDATE
column
In "Traditional Database Backup Techniques", the Oracle database host performs the brunt of the processing. Agents for disk backup, tape backup, and deduplication may all be running on the host. Furthermore, all backup operations—compression, validation, deletion, merging, and so on—occur on the database host. This overhead can greatly degrade database performance.
Recovery Appliance removes almost the entire load from the protected databases. The only backup operations required on the hosts, which could be primary database or standby database hosts, are sending incremental backups to the Recovery Appliance. The incremental-forever strategy reduces the backup window on the database hosts significantly. Recovery Appliance handles backup processing, tape operations, data integrity checks, and routine maintenance.
Note:
Recovery Appliance only supports backups of Oracle databases, not file system data or non-Oracle databases.
Recovery Appliance optimizes management of database changes using delta push and delta store, shown in Figure 1-9. The net result of delta push and delta store is that the problem of lengthening backup windows is eliminated. The DBA performs only fast incremental backups, and lets the Recovery Appliance manage the backup blocks.
This solution consists of two operations that run on each protected database: the incremental-forever backup strategy, and real-time redo transport (described in "Elimination of Data Loss"). Both operations involve protected databases pushing changes to the Recovery Appliance.
In an incremental-forever strategy, only one incremental level 0 backup to the Recovery Appliance is required in the lifetime of each protected file. The initial level 0 backup does not contain committed undo blocks or currently unused blocks.
Note:
The elimination of committed undo and currently unused blocks is only supported for SBT full backups to the Recovery Appliance or Oracle Secure Backup. It is not available for SBT backups to other backup products.
In normal operation, the Recovery Appliance automatically performs the following steps for each incremental level 1 backup:
Receives a scheduled incremental level 1 backup from each protected database
Validates the incoming backup to protect against physical block corruptions
Compresses the backup using specialized block-level algorithms
Writes the backup to a delta store in a Recovery Appliance storage location
The incremental-forever strategy greatly reduces the backup window and overhead because no full backups are ever required after the initial incremental level 0 backup. If the strategy includes real-time redo transport, then backup windows are further reduced because traditional archived log backups are not necessary. Also, Recovery Appliance takes on the burden of validation, deduplication, and compression.
Note:
Blocks compressed using table or Hybrid Columnar Compression remain compressed in the RMAN backup and during the Recovery Appliance ingest phase.
The delta store is the key processing engine for Recovery Appliance. A protected database sends only one incremental level 0 backup of each data file to the Recovery Appliance. Following the initial full backup, all backups are highly efficient cumulative incremental backups.
As Recovery Appliance receives incremental backups, it indexes them and stores them in delta pools. Each separate data file backed up to the Recovery Appliance has its own separate delta pool (set of backup blocks). Recovery Appliance automatically manages the delta pools so that it can provide many virtual full backups.
To create a virtual full backup, Recovery Appliance converts an incoming incremental level 1 backup into a virtual representation of an incremental level 0 backup. A virtual full backup appears as an incremental level 0 backup in the recovery catalog. From the user's perspective, a virtual full backup is indistinguishable from a non-virtual full backup. Using virtual backups, Recovery Appliance provides the protection of frequent level 0 backups with only the cost of frequent level 1 backups.
Note:
Recovery Appliance provides storage services, but not virtual full backups, for RMAN-encrypted backups (see "Archival and Encrypted Backups"). These backups are stored in their original encrypted format. Recovery Appliance can store, archive, and retrieve them just as it can for unencrypted RMAN backup sets.
Recovery Appliance uses virtual full backups to provide rapid recovery to any point in time, regardless of the amount of data being recovered. The on-disk recovery strategy of Recovery Appliance has the advantage that RMAN can recover virtual full backups to any point in time without applying incremental backups.
When a database is protected by the Recovery Appliance, RMAN must only restore a single level 0 backup for the day of the RPO, and then recover up to the last second using redo log files sent using the real-time redo transport feature. For example, if the recovery window is 7 days, and if the RPO is 5 days ago, then RMAN can restore a single virtual full (level 0) backup that is current to 5 days ago, and then recover it using redo—not level 1 incremental backups.
See Also:
Zero Data Loss Recovery Appliance Protected Database Configuration Guide to learn more about the incremental-forever backup strategy
Zero Data Loss Recovery Appliance Protected Database Configuration Guide to learn more about recovery strategies
Oracle Database Backup and Recovery User's Guide to learn more about incremental backups
In "Traditional Database Backup Techniques", management of the database, media server, and tape drives are often separated. For example, a DBA group may manage the databases, while a separate backup administrator group manages the backups, and a storage group manages the disk and tape devices. The overall process lacks visibility, which makes it difficult to manage backups for thousands of databases, each with different recovery requirements.
Cloud Control provides a complete, end-to-end view into the backup lifecycle managed by the Recovery Appliance, from the time the RMAN backup is initiated on the database, to when it is stored on disk, tape, or replicated to a downstream Recovery Appliance. Recovery Appliance monitoring and administration are enabled through installation of the Enterprise Manager for Zero Data Loss Recovery Appliance plug-in (Recovery Appliance plug-in).
Using Cloud Control to manage a Recovery Appliance provides the following benefits:
Standard metrics such as overall backup performance, and aggregate or per-database space consumption
Immediate alerts about any backup or Recovery Appliance issues
For example, Cloud Control may alert the administrator if no backup is available to meet the defined RPO, or if corrupt backups are discovered.
Status reports, enabled by BI Publisher, are useful for capacity planning and to identify protected databases that are not meeting recovery window goals
For example, Recovery Appliance administrators can receive reports on historical space and network usage to identify backup volume and throughput trends. These trends may necessitate adding storage servers to an existing rack or connecting additional racks.
Although Cloud Control is the recommended user interface for Recovery Appliance administration, Oracle supplies the DBMS_RA
PL/SQL package as a command-line alternative (see DBMS_RA Package Reference). Most tasks in this manual provide both Cloud Control and DBMS_RA
techniques. For command-line monitoring and reporting, you can query the Recovery Appliance catalog views (see Recovery Appliance View Reference).
Recovery Appliance scales at a cloud level, supporting tens to hundreds to thousands of databases across a data center. Essentially, Recovery Appliance enables you to create a private data protection cloud within the enterprise. The following technology components within Recovery Appliance make this possible:
Recovery Appliance simplifies management through the protection policy. Benefits include the following:
A protection policy defines recovery window goals that are enforced for each database for backups to the Recovery Appliance or a tape device.
Using protection policies, you can group databases by recovery service tier. For example, databases protected by the Platinum policy require backups to be kept for 45 days on the Recovery Appliance and 90 days on tape, which means that backups aged 45 days or less exist on disk and tape, but backups older than 45 days are only on tape. Databases protected by the Gold policy require 35 days on the local Recovery Appliance and 90 days on tape. Optionally, you can define a maximum retention time within each policy to limit the space consumed, and to comply with service level agreements dictating that backups cannot be maintained for longer than a specified period.
Protection policies are means of grouping databases, improving manageability.
For example, you can configure Recovery Appliance replication or copy-to-tape for a specific protection policy, which means that the configuration applies to all databases associated with this policy. If you add a database to the policy, then the database automatically inherits the configurations and scheduling of the policy.
Using protection policies, the Recovery Appliance manages backup storage space according to the recovery window goal for each protected database. This granular, database-oriented space management approach eliminates the need to manage space at the storage-volume level, as third-party appliances do.
If space is available, then the Recovery Appliance may retain backups older than the recovery window goal, effectively extending the point-in-time recovery period. When space pressure exists, the Recovery Appliance uses predefined thresholds to purge backups. The Recovery Appliance automatically provisions space so that the recovery window goal for each database is met.
See Also:
The approaches in "Traditional Database Backup Techniques" are prone to performance bottlenecks and multiplying points of failure. As the number of databases increases, so does the number of media servers, disk arrays, tape devices, and third-party appliances, and thus so does the overall complexity. The "add more devices" approach is not scalable. In contrast, Recovery Appliance can scale to accommodate increases in backup traffic, storage usage, and the number of databases by adding compute and storage resources in a simple, modular fashion.
See Also:
Zero Data Loss Recovery Appliance Owner's Guide for information about adding storage servers
Oracle Data Guard is a component of a high availability (HA) and disaster recovery solution that can be integrated with Recovery Appliance to provide maximum data protection. Oracle Data Guard minimizes service interruption and resulting data loss by maintaining a synchronized standby database for the protected database. When the primary system is unavailable, the standby immediately assumes the normal operations of the primary after a Data Guard failover operation, including backups to the local Recovery Appliance. Figure 1-10 shows an example of an environment with Recovery Appliance and Oracle Data Guard.
Figure 1-10 Recovery Appliance with Oracle Data Guard
In Figure 1-10, the primary and standby databases each send incremental backups to their local Recovery Appliance. The primary database sends real-time redo changes to both the local Recovery Appliance and the physical standby, and the standby cascades the redo changes to the remote Recovery Appliance. Each Recovery Appliance has backups and redo information for the same database, therefore either appliance can be used for RMAN restore and recovery operations.
See Also:
http://www.oracle.com/technetwork/database/availability/disaster-recovery-2526839.pdf
to learn more about Recovery Appliance with Oracle Data Guard
Oracle Data Guard Concepts and Administration for information about Oracle Data Guard
To begin using Recovery Appliance, refer to the following topics:
Optionally, read Recovery Appliance Architecture to obtain a more in-depth understanding of the principal components of the Recovery Appliance environment.
Read Recovery Appliance Workflow to learn about basic tools and tasks. Before you can use Recovery Appliance for data protection, you must perform the tasks described in the following topics: