11 Backing up and Recovering a Data Warehouse

This chapter describes some considerations for data warehouse backup and recovery and contains the following topics:

How Should I Handle Backup and Recovery for a Data Warehouse?

Backup and recovery are among the most important tasks for an administrator, and data warehouses are no different. However, because of the sheer size of the database, data warehouses introduce new challenges for an administrator in the backup and recovery area.

Data warehouses are unique in that the data can come from myriad resources and it is transformed before being inserted into the database, but mainly because a data warehouse can be very large. Managing the recovery of a large data warehouse can be a daunting task, and traditional OLTP backup and recovery strategies may not meet the needs of a data warehouse.

Data warehouses differ from OLTP systems in the following ways:

  • Data warehouses are typically much larger.

  • A data warehouse may have different availability requirements than an operational system. Even though business decisions do rely on information from the data warehouse, a situation in which, for example, a service desk cannot operate is much worse. Also, due to the size of data warehouses, there is a much higher cost involved in guaranteeing the same level of availability for a data warehouse.

  • Data warehouses are typically populated through more controlled processes, usually referred to as Extraction, Transformation, and Loading (ETL). As a result, updates in a data warehouse are better known and may be reproducible from data sources.

  • A data warehouse typically stores a lot of historical data that is often not subject to change. Data that does not change must be backed up only once.

You must plan a backup strategy as part of your system design and consider what to back up and how frequently to back up. The most important variables in your backup design are the amount of available resources to perform a backup or recovery and the recovery time objective (the amount of time you can afford the system or part of the system to be unavailable).

NOLOGGING operations must be taken into account when planning a backup and recovery strategy. Traditional recovery, restoring a backup and applying the changes from the archive log, does not apply to NOLOGGING operations. Operations that rely on the data that was manipulated by the NOLOGGING operation fail. The NOLOGGING operations must be taken into account when designing a backup and recovery strategy.

Never make a backup when a NOLOGGING operation is taking place.

Plan for one of the following or a combination of the following strategies:

  • ETL strategy

    Recover a backup that does not contain non-recoverable transactions and replay the ETL that has taken place between the backup and the failure.

  • Incremental backup strategy

    Perform a backup immediately after an otherwise non-recoverable transaction has taken place. Oracle provides a tracking file feature that enables incremental backups based on changed data blocks. RMAN leverages the tracking file feature.

Strategies and Best Practices for Backup and Recovery

Devising a backup and recovery strategy can be a daunting task. When you have hundreds of gigabytes of data that must be protected and recovered in the case of a failure, the strategy can be very complex.

The following best practices can help you implement your warehouse's backup and recovery strategy:

Best Practice A: Use ARCHIVELOG Mode

Archived redo logs are crucial for recovery when no data can be lost because the redo logs are a record of changes to the database. Oracle Database can be run in either of two modes:

  • ARCHIVELOG -- Oracle Database archives the filled online redo log files before reusing them in the cycle.

  • NOARCHIVELOG -- Oracle Database does not archive the filled online redo log files before reusing them in the cycle.

Running the database in ARCHIVELOG mode has the following benefits:

  • The database can be completely recovered from both instance and media failure.

  • The user can perform backups while the database is open and available for use.

  • Oracle Database supports multiplexed archive logs to avoid any possible single point of failure on the archive logs

  • The user has more recovery options, such as the ability to perform tablespace point-in-time recovery (TSPITR).

  • Archived redo logs can be transmitted and applied to the physical standby database, which is a replica of the primary database.

  • The database can be completely recovered from both instance and media failure.

Running the database in NOARCHIVELOG mode has the following consequences:

  • The user can only back up the database while it is completely closed after a clean shutdown.

  • Typically, the only media recovery option is to restore the whole database, which causes the loss of all transactions since the last backup.

Is Downtime Acceptable?

Oracle Database backups can be made while the database is open or closed. Planned downtime of the database can be disruptive to operations, especially in global enterprises that support users in multiple time zones, up to 24 hours per day. In these cases, it is important to design a backup plan to minimize database interruptions.

Depending on your business, some enterprises can afford downtime. If your overall business strategy requires little or no downtime, then your backup strategy must implement an online backup. The database does not need to be taken down for a backup. An online backup requires the database to be in ARCHIVELOG mode.

There is no reason not to use ARCHIVELOG mode. All data warehouses (and all mission-critical databases) should use ARCHIVELOG mode. Given the size of a data warehouse (and the amount of time it takes to back up a data warehouse), it is not viable to make an offline backup of a data warehouse, which would be necessary if NOARCHIVELOG mode was used.

Large scale data warehouses may have large amounts of data-modification that generate large volumes of log files. To accommodate the management of many archived log files, RMAN provides the option to compress log files as they are archived. Archiving enables you to keep more archive logs on disk for faster accessibility for recovery.

In summary, a best practice is to put the database in ARCHIVELOG mode to provide online backups and point-in-time recovery options.

Best Practice B: Use RMAN

There are many reasons to adopt Recovery Manager (RMAN). Some of the reasons to integrate RMAN into your backup and recovery strategy are that it offers:

  • Extensive reporting

  • Incremental backups

  • Downtime free backups

  • Backup and restore validation

  • Backup and restore optimization

  • Easily integrates with media managers

  • Block media recovery

  • Archive log validation and management

  • Corrupt block detection

Best Practice C: Use Read-Only Tablespaces

One of the biggest issues facing a data warehouse is the size of a typical data warehouse. Even with powerful backup hardware, backups may still take several hours. Thus, one important consideration in improving backup performance is minimizing the amount of data to be backed up. Read-only tablespaces are the simplest mechanism to reduce the amount of data to be backed up in a data warehouse.

The advantage of a read-only tablespace is that the data is backed up once. If a data warehouse contains 5 years of historical data, then the first 4 years of data can be made read-only. Theoretically, the regular backup of the database would only back up 20 percent of the data. This can dramatically reduce the amount of time required to back up the data warehouse.

Most data warehouses store their data in tables that have been range-partitioned by time. In a typical data warehouse, data is active for a period ranging anywhere from 30 days to 1 year. During this period, the historical data can still be updated and changed (for example, a retailer may accept returns up to 30 days beyond the date of purchase, so that sales data records could change during this period). However, when data has reached a certain date, it is considered to be static.

By taking advantage of partitioning, users can make the static portions of their data read-only. RMAN supports read-only tablespaces rather than read-only partitions or tables. To take advantage of the read-only tablespaces and reduce the backup window, a strategy of storing constant data partitions in a read-only tablespace should be devised. Two strategies for implementing a rolling window are as follows:

  • Implement a regularly scheduled process to move partitions from a read/write tablespace to a read-only tablespace when the data ages to the point where it is considered static.

    The best practice in this case is to put the database in ARCHIVELOG mode to provide online backups and point-in-time recovery options.

  • Create a series of tablespaces, each containing a small number of partitions and regularly modify one tablespace from read/write to read-only as the data in that tablespace ages.

    One consideration is that backing up data is only half of the recovery process. If you configure a tape system so that it can backup the read/write portions of a data warehouse in 4 hours, the corollary is that a tape system might take 20 hours to recover the database if a complete recovery is necessary when 80 percent of the database is read-only.

In summary, a best practice is to place static tables and partitions into read-only tablespaces. A read-only tablespace is backed up once.

Best Practice D: Plan for NOLOGGING Operations

In general, one of the highest priorities for a data warehouse is performance. Not only must the data warehouse provide good query performance for online users, but the data warehouse must also be efficient during the ETL process so that large amount of data can be loaded in the shortest amount of time.

One common optimization leveraged by data warehouses is to execute bulk-data operations using the NOLOGGING mode. The database operations that support NOLOGGING modes are direct-path loads and inserts, index creation, and table creation. When an operation runs in NOLOGGING mode, data is not written to the redo log (or more precisely, only a small set of metadata is written to the redo log). This mode is widely used within data warehouses and can improve the performance of bulk data operations by up to 50 percent.

However, the trade-off is that a NOLOGGING operation cannot be recovered using conventional recovery mechanisms, because the necessary data to support the recovery was never written to the log file. Moreover, subsequent operations to the data upon which a NOLOGGING operation has occurred also cannot be recovered even if those operations were not using NOLOGGING mode. Because of the performance gains provided by NOLOGGING operations, it is generally recommended that data warehouses utilize NOLOGGING mode in their ETL process.

The presence of NOLOGGING operations must be taken into account when devising the backup and recovery strategy. When a database relies on NOLOGGING operations, the conventional recovery strategy (of recovering from the latest tape backup and applying the archived log files) is no longer applicable because the log files cannot recover the NOLOGGING operation.

Never make a backup during a NOLOGGING operation. Oracle Database does not currently enforce this rule: you must schedule the backup jobs and the ETL jobs so that the NOLOGGING operations do not overlap with backup operations.

There are two approaches to backup and recovery in the presence of NOLOGGING operations: ETL or incremental backups. If you are not using NOLOGGING operations in your data warehouse, then you do not have to choose either of the following options: You can recover your data warehouse using archived logs. However, the following options may offer some performance benefits over an archive log-based approach in the event of recovery.

Extraction, Transformation, and Loading

The ETL process uses several Oracle Database features or tools and a combination of methods to load (reload) data into a data warehouse. These features or tools may consist of:

  • Transportable tablespaces. The Oracle transportable tablespace feature enables users to quickly move a tablespace across Oracle databases. It is the most efficient way to move bulk data between databases. Oracle Database provides the ability to transport tablespaces across platforms. If the source platform and the target platform are of different endianness, then RMAN will convert the tablespace being transported to the target format.

  • SQL*Loader. SQL*Loader loads data from external flat files into tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the data file.

  • Data Pump (export/import). Oracle Database offers the Oracle Data Pump technology, which enables high-speed movement of data and metadata from one database to another. This technology is the basis for Oracle's data movement utilities, Data Pump Export and Data Pump Import.

  • External tables. The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.

The ETL Strategy and NOLOGGING Operations

One approach is to take regular database backups and store the necessary data files to re-create the ETL process for an entire week. If a recovery is necessary, the data warehouse could be recovered from the most recent backup. Then, instead of rolling forward by applying the archived redo logs (as would be done in a conventional recovery scenario), the data warehouse could be rolled forward by re-running the ETL processes. If you this paradigm, you assume that the ETL processes can be easily replayed, which would typically involve storing a set of extraction files for each ETL process (many data warehouses do this already as a best practice, in order to be able to identify repair a bad data feed for example).

A sample implementation of this approach is to make a backup of the data warehouse every weekend, and then store the necessary files to support the ETL process for each night. Thus, at most, 7 days of ETL processing would be re-applied in order to recover a database. You can project the length of time to recover the data warehouse, based upon the recovery speeds from tape and performance data from previous ETL runs.

Essentially, the data warehouse administrator is gaining better performance in the ETL process through nologging operations, at a price of slight more complex and less-automated recovery process. Many data warehouse administrators have found that this is a desirable trade-off.

One downside to this approach is that the burden is upon the data warehouse administrator to track all of the relevant changes that have occurred in the data warehouse. This approach will not capture changes that fall outside of the ETL process. For example, in some data warehouses, end-users may create their own tables and data structures. Those changes will be lost in the event of a recovery. This restriction must be conveyed to the end-users. Alternatively, you could also mandate that users create all of private database objects in a separate tablespace, and during recovery, the DBA could recover this tablespace using conventional recovery while recovering the rest of the database by replaying the ETL process.

In summary, a best practice is to restore a backup that does not contain nonrecoverable (NOLOGGING) transactions. Then replay the ETL process to reload the data.

Sizing the Block Change Tracking File

The size of the block change tracking file is proportional to:

  • The database size in bytes. The block change tracking file contains data that represents data file blocks in the database. The data is approximately 1/250000 of the total size of the database.

  • The number of enabled threads. All Oracle Real Application Cluster (Oracle RAC) instances have access to the same block change tracking file. However, the instances update different areas of the tracking file without any locking or internode block swapping. You enable block change tracking for the entire database and not for individual instances.

  • The changed block metadata. The block change tracking file keeps a record of all changes between previous backups, in addition to the modifications since the last backup. The tracking file retains the change history for a maximum of eight backups. If the tracking file contains the change history for eight backups, then the Oracle database overwrites the oldest change history information.

Let us take an example of a 500 GB database, with only one thread, and having eight backups kept in the RMAN repository will require a block change tracking file of 20 MB.

((Threads * 2) + number of old backups) * (database size in bytes)
------------------------------------------------------------------ = 20MB

Incremental Backup

A more automated backup and recovery strategy in the presence of NOLOGGING operations leverages RMAN's incremental backup capability Incremental backups have been part of RMAN since it was first released. Incremental backups provide the capability to backup only the changed blocks since the previous backup. Incremental backups of data files capture data changes on a block-by-block basis, rather than requiring the backup of all used blocks in a data file. The resulting backup sets are generally smaller and more efficient than full datafile backups, unless every block in the data file changed.

Oracle Database delivers the ability for faster incrementals with the implementation of the change tracking file feature. When you enable block change tracking, Oracle Database tracks the physical location of all database changes. RMAN automatically uses the change tracking file to determine which blocks must be read during an incremental backup and directly accesses that block to back it up.

The Incremental Approach

A typical backup and recovery strategy using this approach is to back up the data warehouse every weekend, and then take incremental backups of the data warehouse every night following the completion of the ETL process. Note that incremental backups, like conventional backups, must not be run concurrently with nologging operations. To recover the data warehouse, the database backup is restored, and then each night's incremental backups would be reapplied. Although the NOLOGGING operations were not captured in the archive logs, the data from the NOLOGGING operations is in the incremental backups. Moreover, unlike the previous approach, this backup and recovery strategy can be completely managed using RMAN.

The replay ETL approach and the incremental backup approach are both recommended solutions to efficiently and safely back up and recover a database which is a workload consisting of many NOLOGGING operations. The most important consideration is that your backup and recovery strategy must take these NOLOGGING operations into account.

In summary, a best practice is to implement the block change tracking feature and make an incremental backup after a direct load that leaves objects unrecoverable due to NOLOGGING operations.

Best Practice E: Not All Tablespaces Are Equally Important

While the simplest backup and recovery scenario is to treat every tablespace in the database the same, Oracle Database provides the flexibility for you to devise a backup and recovery scenario for each tablespace as needed.

Not all of the tablespaces in a data warehouse are equally significant from a backup and recovery perspective. You can use this information to devise more efficient backup and recovery strategies when necessary. The basic granularity of backup and recovery is a tablespace, so different tablespaces can have different backup and recovery strategies. On the most basic level, temporary tablespaces never need to be backed up (a rule that RMAN enforces).

Moreover, in some data warehouses, there may be tablespaces that are not temporary tablespaces but they are functioning as temporary tablespaces because they are dedicated to scratch space for end-users to store temporary tables and incremental results. Depending upon the business requirements, these tablespaces may not need to backed up and restored. Instead, in the case of a loss of these tablespaces, the users would re-create their own data objects.

In many data warehouses, some data is more important than other data. For example, the sales data in a data warehouse may be crucial, and, in a recovery situation, this data must be online as soon as possible. In the same data warehouse, a table that stores clickstream data from the corporate Web site may be much less critical. The business may tolerate this data being offline for a few days or may be able to accommodate the loss of several days of clickstream data in the event of a loss of database files. In this scenario, the tablespaces that contains sales data must be backed up often, while the tablespaces that contains clickstream data need only to be backed up once every week or two.