Implementing Disaster Recovery
The 5800 system implements a distributed data model that includes extensive self-healing functionality to protect against localized hardware failure.
This chapter describes how to protect your 5800 system against catastrophic system loss. It contains the following sections:
About the 5800 System Implementation of NDMP
The 5800 system does not require back-ups in the conventional sense, since the system heals automatically from any failures.
To enable you to recover from a catastrophic system loss, however, the 5800 system implements a subset of the Network Data Management Protocol (NDMP). NDMP allows you to back-up the data stored on the system to tape and restore that data in the event of catastrophic system loss.
The 5800 system NDMP implementation allows only for complete restoration of data to an empty cell, not partial restorations. Before restoring data to a cell, you must use the CLI or GUI to delete all data from the cell. See wipe for information about the CLI wipe command. See To Delete All Data Using the GUI for information about deleting data using the GUI.
During a restoration of data to the 5800 system, you must restore the most recent back-up first, and then restore back-ups covering the entire span of the system’s operation. After the most recent back-up is restored, you can restore the other back-ups in any order.
See the remaining sections in this chapter for information about using NetVault as the back-up product and for more guidelines about backing up and restoring data to the 5800 system.
|Note - The 5800 system acts as an NDMP data server (filer). It does not implement the optional Direct Access Recovery (DAR) portion of the NDMP protocol, which assumes a directory structure that the 5800 system does not have. The 5800 system does not require the DAR feature to recover individual files since the 5800 system automatically recovers data lost from any hardware failures.
Using NetVault to Implement Disaster Recovery
Sun has tested 5800 system disaster recovery using NetVault, Version 7.4.5, with the NDMP plug-in from BakBone Software on a SPARC®-based system running Solaris 10. NetVault supports a wide range of tape devices. After you configure NetVault with the BakBone patch for 5800 system support, you can use the NetVault GUI or CLI to control all back-up and restore operations.
For detailed information about using NetVault with the 5800 system, contact your service representative to obtain a copy of Protecting the Sun StorageTek 5800 System with Bakbone NetVault using NDMP and also a pointer to BakBone Software’s documentation about using NetVault with the 5800 system.
|Note - If you are using the authorized subnetworks feature on the 5800 system, the system that NetVault is running from must be on an authorized subnetwork. If you have left the authorized subnetwork setting at its default of all (thereby allowing any client on the network to access the data stored on the 5800 system) this is not an issue. See Authorized Subnetworks for more information.
Checking NDMP Status
Use the sysstat command to determine the status of back-up and restore on the 5800 system.
ST5800 $ sysstat
Cell 0: Online. Estimated Free Space: 7.49 TB
8 nodes online, 32 disks online.
Data VIP 10.8.60.104, Admin VIP 10.8.60.103
Data services Online, Query Engine Status: HAFaultTolerant
Data Integrity check not completed since boot
Data Reliability check last completed at Wed Sep 05 07:12:43 UTC 2007
Query Integrity established as of Wed Sep 05 01:31:20 UTC 2007
NDMP status: Backup ready.
- Backup unavailable - A very rare state indicating either that an error occurred during restore or that the system database that drives back-up is not ready. In the case of an error during restore, restart the restore operation from the beginning. In the case of a problem with the database, the system generally recovers on its own, but if the problem persists, contact Sun support.
- Backup ready - System is ready to be backed up.
- Backup writing to tape: number_of_objects, number_of_bytes processed. - Backup is in process.
- Restore reading tape: number_of_objects, number_of_bytes processed.- Restore is in process.
- Restore in progress. Ready for next tape. - Full disaster recovery has been initiated, but data from the entire range of dates to be restored has not been completed. Proceed with the next back-up job.
- Safe to backup to date. The back-up process might miss some data stored after the date indicated because the system database has fallen behind the data ingest rate. This state occurs only during periods when the 5800 system is ingesting large amounts of data and is corrected automatically when the database catches up. You may perform a back-up during this time anyway, and then later, when the database is caught up, perform another consolidating back-up to duplicate this time period.
General Guidelines for Backing Up Data
To back-up data to tape, follow these general guidelines:
- Back-ups occur over a single connection. The longer the back-up session, the more chance that something might cause the back-up process to end. Therefore, try to limit the amount of data backed up in a single job. As a general best practice, estimate the rate at which data is being stored on the 5800 system and specify a back-up time range during which 1 TB or less was stored.
- As described in the previous bullet, a best practice is to back-up data in multiple incremental sessions that back-up 1 TB or less of data during each session. You can also issue one or more consolidating back-ups, which span multiple sessions. The longer the range of time you specify for these consolidating back-ups, the more data will be backed up per session and the fewer number of sessions remain to manage. These consolidating sessions might take a long time to complete and should only be performed after the data being consolidated is already accounted for on tape.
- If something causes a back-up session to abort, you must run the entire back-up again from the beginning.
General Guidelines for Restoring Data
To restore data from tape, follow these general guidelines:
- The NDMP implementation on the 5800 system allows only for complete restoration of data to an empty cell, not partial restorations.
- You can only restore data to the same size cell as the cell from which the data was backed up. In other words, you cannot restore data from a half-cell system to a full-cell system.
- The system software version running on the 5800 system to which you are restoring data must be identical to the system software version that was running on the 5800 system when the last (most recent) backup was made.
- Before restoring data to a cell, you must use the CLI or GUI to delete all data from the cell. See wipe for information about the CLI wipe command. See To Delete All Data Using the GUI for information about deleting data using the GUI.
- All nodes and disks in the system must be online before starting the restore operation. If any nodes or disks are missing, contact Sun service to replace the disks and bring them online before attempting the restoration.
- If any nodes or disks fail during the restore operation, you must contact Sun service to replace the failed disks or nodes, and then you must use the CLI or GUI to delete all data from the cell and begin the restoration again. See wipe for information about the CLI wipe command. See To Delete All Data Using the GUI for information about deleting data using the GUI.
- You must restore the most recent back-up first, and then restore back-ups covering the entire span of the system’s operation. After the most recent back-up is restored, you can restore the other back-ups in any order.
- A complete restoration of the data can require a significant amount of time. For best results, after you have restored the most recent back-up, restore the highest priority data first.
- During restoration of the most recent backup, all client services, such as WebDAV and API access, are unavailable. To minimize this potential downtime, and to protect the most current data, a best practice is to do a relatively small back-up every day to have on hand as the first (most recent) back-up to restore.
- During restoration of the most recent backup, any administrative changes made to the system’s configuration will be lost. Therefore, while the first restore is in progress, do not make any changes to NTP settings, DNS settings, or other configuration settings.
- You may not have access to the data on the 5800 system via WebDAV for up to 12 hours after the first restore has completed.
- After the first restore session, you must reboot the 5800 system to ensure that the query engine and WebDAV function correctly after restoration is complete.