About Backup and Disaster Recovery

1 About Backup and Disaster Recovery

A backup and disaster recovery strategy is a key component in a comprehensive operational maintenance plan. A disaster recovery strategy aims to enable the continuation or recovery of systems, infrastructure, and data following a destructive disaster event.

Disaster events can be natural events, physical hardware failures, software bugs, or human induced data destruction through error or malice. In all cases, devise a plan that caters to recovering critical systems and data with as little downtime as possible and minimal data loss.

The following are different disaster events that might require different procedures and policies:

Data center failure

A disaster recovery strategy that caters to data center failure can be costly but can also effectively mirror data and systems in several different geographical locations. This approach can help recover from an event that can affect an entire data center at a particular geographical location. To handle these types of events, physical systems must be available in multiple locations and data must be replicated to each location so that systems can be restored quickly.

The cost of a disaster strategy that caters to this type of failure can be reduced by using cloud-type services, such as Oracle Cloud Infrastructure, that provide services across geographical locations.
System failure

A system failure strategy must cater to providing physical hardware to replace components or whole systems in case hardware fails or a destructive action removes full system functionality. In certain cases, physical hardware might address component failure by providing redundant components. However, the strategy should consider the possibility of total system failure. Plan around how quick you can physically replace a complete system for each business-critical resource that you need to restore. Ideally, this strategy should also include a plan to deploy software configuration information and the required data.

Oracle Cloud Infrastructure can help reduce the total cost of ownership when planning for system failure, as it provides the facilities to create custom system images and configuration entries based on existing infrastructure for quick deployment of a new system or configuration, as needed.

You can achieve further system resilience through virtualization or container solutions, which help abstract system processes and functionality from the physical hardware. Use of virtualization or container services also provides the opportunity to create images or deployment plans to rapidly recover services, as needed.

For information about virtualization in Oracle Linux visit one of the following links:
For more information about container services visit the following link:
- Oracle Linux: Podman User's Guide
Disk or volume failure

Typically, every disaster recovery strategy revolves around events that involve disk or volume failure. Although disk failures occur, their frequency has been largely reduced as hardware evolves. Still, backup and mirroring software are low-cost and easy to implement on Oracle Linux that can help with the mitigation of these issues.

Disk and volume failure are typically handled by performing some kind of data mirroring or replication. Data resilience is often achieved through disk redundancy by using RAID-1 mirroring, volume snapshotting, and traditional backup methods. See Working With Data Mirroring for more information.

Volume-level snapshotting replicates data across volumes and is discussed in more depth in Working With File System Snapshots.

Full data backup, which is described in Managing Backups With ReaR, also provides some level of platform recovery in the case of system-level failure.

In Oracle Cloud Infrastructure, block devices that function as disks for created instances have built-in data replication capability across multiple servers to guarantee availability and uptime. Thus, for Oracle Cloud Infrastructure instances, the mitigation against disk or volume failure is done automatically.
User and software events

User and software events can include malicious attacks on systems or inadvertent errors that result in the destruction or corruption of data on a file system. Software bugs or updates might also result in unintended configuration changes and other data corruption. Therefore, in any disaster recovery strategy, rapid rollback to a known, working environment is critical.

Traditionally, this domain has largely been handled by full and regular data backups. This approach is still useful, but recovery can be slow and typically requires some downtime. To maximize protection, combine this approach with other, quicker solutions. See Managing Backups With ReaR for more information about managing backups.

The file system snapshotting feature provided by Btrfs can reduce the amount of time that's required to return a system to a known working state. For more information and instructions, see Working With File System Snapshots.

A comprehensive disaster recovery plan should use a combination of the tools that are suited to specific platform, environment, and hosting needs.

Cloud-based services typically provide the tools and built-in redundancy to mitigate against data loss and speed up recovery time. However, even in these environments, you can use other tools and facilities in combination to cover all potential disaster scenarios. For example, by using file system snapshotting on a cloud instance, you can fine-tune system rollback even after a basic software update.

For physical systems in a specific data center, a wider range of tools and services is available to ensure resilience and durability against hardware and software disasters.

This document provides pointers to the different tools that are available in Oracle Linux for achieving a more comprehensive disaster recovery strategy by using software that's native to the OS. In addition, more thorough coverage is also provided for the Relax-and-Recover (ReaR) and data backup tools that are provided with Oracle Linux.