Skip Headers
Oracle® Hierarchical Storage Manager and StorageTek QFS Software File System Recovery Guide
Release 6.0
E42065-03
  Go To Documentation Library
Library
Go To Table Of Contents
Contents

Previous
Previous
 
Next
Next
 

1 Introduction

This document outlines the steps that you should take in order to recover Oracle Hierarchical Storage Manager and StorageTek QFS Software, files, and file systems that have been lost or corrupted due to hardware failure, misconfiguration, human error, or the physical destruction of facilities and equipment. Properly configured Oracle HSM file systems are extremely robust. But the steps that you need to take during recovery—and your probability of success—depend on your degree of preparedness. So this introduction begins with an overview of the recovery process. Then it moves to a review of the data- and file system-protection measures that Oracle recommends. Finally it outlines the recovery steps that are open to you, given the preparations that you have made and the resources that you currently have available.

Failure and Recovery Scenarios

The scope of a file-system failure and the nature of the required recovery actions depend on the nature of the underlying problem. For example:

  • If the server host fails, the Oracle HSM software and file-system configurations may be lost, leaving file system data and metadata intact but inaccessible until the configuration information is restored.

    Once the underlying hardware problem has been addressed and the operating system has been restored, you reinstall the software and restore the configuration files from backup copies. In this situation, follow the procedures outlined in Chapter 3, "Restoring the Oracle HSM Configuration".

  • If an administrator inadvertently deletes or corrupts one or more configuration files, library catalogs, scripts, or crontab entries, access to one or more file systems may be lost along with some or all software functionality.

    You restore the configuration files from backup copies. Follow the procedures outlined in Chapter 3, "Restoring the Oracle HSM Configuration".

  • If a disk or RAID group that provides the disk cache for the data in a standalone (non-archiving) QFS file-system fails, all files in the disk cache are lost.

    Once the hardware problem has been addressed, you restore lost files from QFS backup copies. See "Recovering Files Using a Recovery Point File".

  • If a disk or RAID group that provides the disk cache for the data in an archiving file-system fails, all files in the disk cache are lost.

    Once the hardware problem has been addressed, you restore files from archived copies or from Oracle HSM backup files. See "Recovering Files Using a Recovery Point File" and "Recovering Files Using Archiver Log Entries".

  • If the disks that store file system metadata fail, the file system is lost and the data is no longer readily accessible.

    Once the hardware problem has been addressed, you restore metadata from backup files. If the metadata for an archiving file system was not backed up, it can be reconstructed from backup copies of the archiver log file. See Chapter 5, "Recovering Lost and Damaged Files".

  • If an administrator inadvertently formats the disk partitions that host an Oracle HSM file system or issues the sammkfs command against existing Oracle HSM partitions, all files and metadata are lost.

    You restore metadata from backup files or reconstruct it from the archiver log of an archiving file system. Data can be restored from archival media or from a backup file. See Chapter 5, "Recovering Lost and Damaged Files".

Recommended Preparations

In the Oracle Hierarchical Storage Manager and StorageTek QFS Installation and Configuration Guide, Oracle recommends that you take the following configuration, file-system, and data backup steps during your initial configuration:

  • Store critical data in Oracle HSM archiving file systems.

    Archive at least two copies of the file data. Archive at least one copy on removable media, such as magnetic tape.

    If possible, configure disk archives on independent file systems that do not share physical devices with the disk cache of the archiving file-system.

  • Store file-system metadata on highly redundant, mirrored storage.

  • Regularly backup Oracle HSM file systems with recovery point files.

    A recovery point file stores file-system metadata and, optionally, data, so that files or entire file systems can be restored.

    If you have the Oracle Hierarchical Storage Manager software installed, you create recovery point files by running the samfsdump command. If you have only the QFS file system software, you use the qfsdump command. You can run the dump commands from the command line or from the Oracle HSM Manager graphical user interface.

    Using either command on its own backs up the metadata. Using either command with the -U option backs up data as well as metadata. The -U option is mainly useful for protecting file systems that are not archived to removable media.

  • Configure the host to automatically save Oracle HSM metadata recovery point files. Create entries in the Solaris crontab file, or use the scheduling feature of the Oracle HSM Manager.

  • Configure the host to automatically save Oracle HSM archiver log information. Create entries in the Solaris crontab file.

    For each file that is archived with the Oracle Hierarchical Storage Manager software, the archiver log file records the file's name and location (path) within the file system, the name of the archive (tar) file that holds copies, the removable media volumes that holds the archive file, and the position of the archive file on the media.

  • Save backup copies of configuration files, crontab entries, and custom file-system management scripts (if any).

  • Select a secure storage location for the Oracle HSM recovery information.

    Select an independent file system that you can mount on the Oracle HSM file system host.

    Make sure that the selected file system does not share any physical devices, logical volumes, partitions, or LUNs with the archiving file system. Do not store disaster recovery resources in the file system that they are meant to protect.