This chapter describes several common media failure scenarios. It shows how to recover from each failure when using a user-managed backup and recovery strategy, that is, a strategy that does not depend upon Recovery Manager. This chapter includes the following topics:
The archiving mode of the database:
The type of media failure
The files affected by the media failure
If the media failure is temporary, correct the underlying problem and restart the database. Usually, crash recovery will recover all committed transactions from the online redo log. If the media failure is permanent, then restore the database as described in "Recovering a Database in NOARCHIVELOG Mode".
|Damaged Datafiles||Database Status||Solution|
|Datafiles in the
||Database shuts down.||If the hardware problem is temporary, then fix it and restart the database. Usually, crash recovery recovers lost transactions. If the hardware problem is permanent, then recover the database as described in "Performing Closed Database Recovery".|
|Datafiles not in the
||Affected datafiles are taken offline, but the database stays open.||If the unaffected portions of the database must remain available, then do not shut down the database. Take tablespaces containing problem datafiles offline using the temporary option, then recover them as described in "Performing Datafile Recovery in an Open Database".|
If database recovery with a backup control file rolls forward through a
TABLESPACE or an
DATAFILE operation, then the database stops recovery when applying the redo record for the added files and lets you confirm the filenames.
For example, suppose the following sequence of events occurs:
You may see the following error when applying the
TABLESPACE redo data:
ORA-00283: recovery session canceled due to errors ORA-01244: unnamed datafile(s) added to control file by media recovery ORA-01110: data file 11: '/oracle/oradata/trgt/test02.dbf' ORA-01110: data file 10: '/oracle/oradata/trgt/test01.dbf'
To recover through an ADD DATAFILE operation, use the following procedure:
View the files added by selecting from
V$DATAFILE. For example:
SELECT FILE#,NAME FROM V$DATAFILE; FILE# NAME --------------- ---------------------- 1 /oracle/oradata/trgt/system01.dbf . . . 10 /oracle/oradata/trgt/UNNAMED00001 11 /oracle/oradata/trgt/UNNAMED00002
If multiple unnamed files exist, then determine which unnamed file corresponds to which datafile by using one of these methods:
.log, which contains messages about the original file location for each unnamed file.
Derive the original file location of each unnamed file from the error message and
V$DATAFILE: each unnamed file corresponds to the file in the error message with the same file number.
ALTER DATABASE RENAME FILE '/db/UNNAMED00001' TO '/oracle/oradata/trgt/test01.dbf'; ALTER DATABASE RENAME FILE '/db/UNNAMED00002' TO '/oracle/oradata/trgt/test02.dbf';
Continue recovery by issuing the previous recovery statement. For example:
RECOVER AUTOMATIC DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL
All archived log files written after the creation of the original datafile are available
The control file contains the name of the damaged file (that is, the control file is current, or is a backup taken after the damaged datafile was added to the database)
Note:You cannot re-create any of the datafiles for the
SYSTEMtablespace by using the
DATAFILEclause of the
DATABASEstatement because the necessary redo is not available.
To re-create a datafile for recovery:
Create a new, empty datafile to replace a damaged datafile that has no corresponding backup. For example, assume that the datafile
?/oradata/trgt/users01.dbf has been damaged, and no backup is available. The following statement re-creates the original datafile (same size) on
ALTER DATABASE CREATE DATAFILE '?/oradata/trgt/users01.dbf' AS '/disk2/users01.dbf';
This statement creates an empty file that is the same size as the lost file. The database looks at information in the control file and the data dictionary to obtain size information. The old datafile is renamed as the new datafile.
Perform media recovery on the empty datafile. For example, enter:
RECOVER DATAFILE '/disk2/users01.dbf'
All archived logs written after the original datafile was created must be applied to the new, empty version of the lost datafile during recovery.
You can recover backups through an
RESETLOGS so long as:
You have a current, backup, or created control file that knows about the prior incarnations
You have all available archived redo logs
If you need to re-create the control file, the trace file generated by
TRACE will contain the necessary commands to re-construct the complete incarnation history. The
V$DATABASE_INCARNATION view displays the RESETLOGS history known to the control file, while the
V$LOG_HISTORY view displays the archived log history.
It is possible for the incarnation history to be incomplete in the in re-created control file. For example, archived logs necessary for recovery may be missing. In this case, it is possible to create incarnation records explicitly with the
In the following example, you register four logs that are necessary for recovery but are not recorded in the re-created control file, and then recover the database:
ALTER DATABASE REGISTER LOGFILE '?/oradata/trgt/arch/arcr_1_1_42343523.arc'; ALTER DATABASE REGISTER LOGFILE '?/oradata/trgt/arch/arcr_1_1_34546466.arc'; ALTER DATABASE REGISTER LOGFILE '?/oradata/trgt/arch/arcr_1_1_23435466.arc'; ALTER DATABASE REGISTER LOGFILE '?/oradata/trgt/arch/arcr_1_1_12343533.arc'; RECOVER AUTOMATIC DATABASE;
You can create tables and indexes with the
SELECT statement. You can also specify that the database create them with the
NOLOGGING option. When you create a table or index as
NOLOGGING, the database does not generate redo log records for the operation. Thus, you cannot recover objects created with
NOLOGGING, even if you are running in
Note:If you cannot afford to lose tables or indexes created with
NOLOGGING, then make a backup after the unrecoverable table or index is created.
Be aware that when you perform media recovery, and some tables or indexes are created normally whereas others are created with the
NOLOGGING option, the
NOLOGGING objects are marked logically corrupt by the
RECOVER operation. Any attempt to access the unrecoverable objects returns an
ORA-01578 error message. Drop the
NOLOGGING objects and re-create them if needed.
Because it is possible to create a table with the
NOLOGGING option and then create an index with the
LOGGING option on that table, the index is not marked as logically corrupt after you perform media recovery. The table was unrecoverable (and thus marked as corrupt after recovery), however, so the index points to corrupt blocks. The index must be dropped, and the table and index must be re-created if necessary.
See Also:Oracle Data Guard Concepts and Administration for information about the impact of
NOLOGGINGon a b database
If you have a read-only tablespace on read-only or slow media, then you may encounter errors or poor performance when recovering with the
CONTROLFILE option. This situation occurs when the backup control file indicates that a tablespace was read/write when the control file was backed up. In this case, media recovery may attempt to write to the files. For read-only media, the database issues an error saying that it cannot write to the files. For slow media, such as a hierarchical storage system backed up by tapes, performance may suffer.
To avoid these recovery problems, use current control files rather than backups to recover the database. If you need to use a backup control file, then you can also avoid this problem if the read-only tablespace has not suffered a media failure.
You have these alternatives for recovering read-only and slow media when using a backup control file:
Take datafiles from read-only tablespaces offline before doing recovery with a backup control file, and then bring the files online at the end of media recovery.
Use the correct version of the control file for the recovery. If the tablespace will be read-only when recovery completes, then the control file backup must be from a time when the tablespace was read-only. Similarly, if the tablespace will be read/write at the end of recovery, then the control file must be from a time when the tablespace was read/write.
If a current or backup control file is unavailable for recovery, then you can execute a
CONTROLFILE statement as described in "Create New Control File After Losing All Current and Backup Control Files". Read-only files should not be listed in the
CONTROLFILE statement so that recovery can skip these files. No recovery is required for read-only datafiles unless you restored backups of these files from a time when the datafiles were read/write.
After you create a new control file and attempt to mount and open the database, the database performs a data dictionary check against the files listed in the control file. For each file that is not listed in the
CONTROLFILE statement but is present in the data dictionary, an entry is created for them in the control file. These files are named as
nnnnn is a five digit number starting with
After the database is open, rename the read-only files to their correct filenames by executing the
FILE statement for all the files whose name is prefixed with
To prepare for a scenario in which you might have to re-create the control file, run the following statement when the database is mounted or open to obtain the
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
This SQL statement produces a trace file that you can edit and use as a script to re-create the control file. You can specify either the
NORESETLOGS (default) keywords to generate
NORESETLOGS versions of the script.
All the restrictions related to read-only files in
CONTROLFILE statements also apply to offline normal tablespaces, except that you need to bring the tablespace online after the database is open. You should leave out tempfiles from the
CONTROLFILE statement and add them after database open.
See Also:Oracle Database Backup and Recovery Basics to learn how to make trace backups of the control file
The transportable tablespace feature of Oracle allows a user to transport a set of tablespaces from one database to another. Transporting a tablespace into a database is like creating a tablespace with preloaded data. Using this feature is often an advantage because:
It is faster than using the Export or SQL*Loader utilities because it involves only copying datafiles and integrating metadata
You can use it to move index data, hence avoiding the necessity of rebuilding indexes
See Also:Oracle Database Administrator's Guide for detailed information about using the transportable tablespace feature
Like normal tablespaces, transportable tablespaces are recoverable. However, while you can recover normal tablespaces without a backup, you must have a version of the transported datafiles in order to recover a transported tablespace.
To recover a transportable tablespace, use the following procedure:
If the database is open, then take the transported tablespace offline. For example, if you want to recover the
users tablespace, then issue:
ALTER TABLESPACE users OFFLINE IMMEDIATE;
Restore a backup of the transported datafiles with an operating system utility. The backup can be the initial version of the transported datafiles or any backup taken after the tablespace is transported. For example, enter:
% cp /backup/users.dbf $ORACLE_HOME/oradata/trgt/users01.dbf
Recover the tablespace as normal. For example, enter:
RECOVER TABLESPACE users
You may see the error
ORA-01244 when recovering through a transportable tablespace operation just as when recovering through a
TABLESPACE operation. In this case, rename the unnamed files to the correct locations using the procedure in "Recovering Through an Added Datafile with a Backup Control File: Scenario".
The configuration of the online redo log: mirrored or non-mirrored
The type of media failure: temporary or permanent
The types of online redo log files affected by the media failure: current, active, unarchived, or inactive
Table 19-1 displays
V$LOG status information that can be crucial in a recovery situation involving online redo logs.
The online redo log has never been written to.
The online redo log is active, that is, needed for instance recovery, and it is the log to which the database is currently writing. The redo log can be open or closed.
The online redo log is active, that is, needed for instance recovery, but is not the log to which the database is currently writing.It may be in use for block recovery, and may or may not be archived.
The log is being re-created as an empty log after an
The current log is being cleared of a closed thread. The log can stay in this status if there is some failure in the switch such as an I/O error writing the new log header.
The log is no longer needed for instance recovery. It may be in use for media recovery, and may or may not be archived.
If the online redo log of a database is multiplexed, and if at least one member of each online redo log group is not affected by the media failure, then the database continues functioning as normal, but error messages are written to the log writer trace file and the
.log of the database.
Solve the problem by taking one of the following actions:
If the hardware problem is temporary, then correct it. The log writer process accesses the previously unavailable online redo log files as if the problem never existed.
Note:The newly added member provides no redundancy until the log group is reused.
SELECT GROUP#, STATUS, MEMBER FROM V$LOGFILE WHERE STATUS='INVALID'; GROUP# STATUS MEMBER ------- ----------- --------------------- 0002 INVALID /oracle/oradata/trgt/redo02.log
Drop the damaged member. For example, to drop member
redo01.log from group
ALTER DATABASE DROP LOGFILE MEMBER '/oracle/oradata/trgt/redo02.log';
Add a new member to the group. For example, to add
redo02.log to group
ALTER DATABASE ADD LOGFILE MEMBER '/oracle/oradata/trgt/redo02b.log' TO GROUP 2;
If the file you want to add already exists, then it must be the same size as the other group members, and you must specify
REUSE. For example:
ALTER DATABASE ADD LOGFILE MEMBER '/oracle/oradata/trgt/redo02b.log' REUSE TO GROUP 2;
If a media failure damages all members of an online redo log group, then different scenarios can occur depending on the type of online redo log group affected by the failure and the archiving mode of the database.
If the damaged log group is active, then it is needed for crash recovery; otherwise, it is not.
|If the group is . . .||Then . . .||And you should . . .|
|Inactive||It is not needed for crash recovery||Clear the archived or unarchived group.|
|Active||It is needed for crash recovery||Attempt to issue a checkpoint and clear the log; if impossible, then you must restore a backup and perform incomplete recovery up to the most recent available redo log.|
|Current||It is the log that the database is currently writing to||Attempt to clear the log; if impossible, then you must restore a backup and perform incomplete recovery up to the most recent available redo log.|
SELECT GROUP#, STATUS, MEMBER FROM V$LOGFILE; GROUP# STATUS MEMBER ------- ----------- --------------------- 0001 /oracle/dbs/log1a.f 0001 /oracle/dbs/log1b.f 0002 INVALID /oracle/dbs/log2a.f 0002 INVALID /oracle/dbs/log2b.f 0003 /oracle/dbs/log3a.f 0003 /oracle/dbs/log3b.f
Determine which groups are active. For example, enter:
SELECT GROUP#, MEMBERS, STATUS, ARCHIVED FROM V$LOG; GROUP# MEMBERS STATUS ARCHIVED ------ ------- --------- ----------- 0001 2 INACTIVE YES 0002 2 ACTIVE NO 0003 2 CURRENT NO
If the affected group is inactive, follow the procedure in Losing an Inactive Online Redo Log Group. If the affected group is active (as in the preceding example), then follow the procedure in "Losing an Active Online Redo Log Group".
|If the failure is . . .||Then . . .|
|Temporary||Fix the problem. LGWR can reuse the redo log group when required.|
|Permanent||The damaged inactive online redo log group eventually halts normal database operation. Reinitialize the damaged group manually by issuing the
You can clear an inactive redo log group when the database is open or closed. The procedure depends on whether the damaged group has been archived.
If the database is shut down, then start a new instance and mount the database:
Reinitialize the damaged log group. For example, to clear redo log group
2, issue the following statement:
ALTER DATABASE CLEAR LOGFILE GROUP 2;
Clearing a not-yet-archived redo log allows it to be reused without archiving it. This action makes backups unusable if they were started before the last change in the log, unless the file was taken offline prior to the first change in the log. Hence, if you need the cleared log file for recovery of a backup, then you cannot recover that backup. Also, it prevents complete recovery from backups due to the missing log.
If the database is shut down, then start a new instance and mount the database:
Clear the log using the
UNARCHIVED keyword. For example, to clear log group
ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2;
If there is an offline datafile that requires the cleared log to bring it online, then the keywords
DATAFILE are required. The datafile and its entire tablespace have to be dropped because the redo necessary to bring it online is being cleared, and there is no copy of it. For example, enter:
ALTER DATABASE CLEAR LOGFILE UNARCHIVED GROUP 2 UNRECOVERABLE DATAFILE;
Immediately back up the whole database with an operating system utility, so that you have a backup you can use for complete recovery without relying on the cleared log group. For example, enter:
% cp /disk1/oracle/dbs/*.f /disk2/backup
Back up the database's control file with the
DATABASE statement. For example, enter:
ALTER DATABASE BACKUP CONTROLFILE TO '/oracle/dbs/cf_backup.f';
Relocate the redo log file onto alternative media by re-creating it under the currently configured redo log filename
Reuse the currently configured log filename to re-create the redo log file because the name itself is invalid or unusable (for example, due to media failure)
In these cases, the
LOGFILE statement (before receiving the I/O error) would have successfully informed the control file that the log was being cleared and did not require archiving. The I/O error occurred at the step in which the
LOGFILE statement attempts to create the new redo log file and write zeros to it. This fact is reflected in
If the database is still running and the lost active redo log is not the current log, then issue the
CHECKPOINT statement. If successful, then the active redo log becomes inactive, and you can follow the procedure in "Losing an Inactive Online Redo Log Group". If unsuccessful, or if your database has halted, then perform one of procedures in this section, depending on the archiving mode.
The current log is the one LGWR is currently writing to. If a LGWR I/O fails, then LGWR terminates and the instance crashes. In this case, you must restore a backup, perform incomplete recovery, and open the database with the
To recover from loss of an active online log group in NOARCHIVELOG mode:
Restore the database from a consistent, whole database backup (datafiles and control files) as described in "Restoring Datafiles Before Performing Incomplete Recovery". For example, enter:
% cp /disk2/backup/*.dbf $ORACLE_HOME/oradata/trgt/
Mount the database:
Because online redo logs are not backed up, you cannot restore them with the datafiles and control files. In order to allow the database to reset the online redo logs, you must first mimic incomplete recovery:
RECOVER DATABASE UNTIL CANCEL CANCEL
Open the database using the
ALTER DATABASE OPEN RESETLOGS;
Shut down the database consistently. For example, enter:
Make a whole database backup.
To recover from loss of an active online redo log group in ARCHIVELOG mode:
Begin incomplete media recovery, recovering up through the log before the damaged log.
Ensure that the current name of the lost redo log can be used for a newly created file. If not, then rename the members of the damaged online redo log group to a new location. For example, enter:
ALTER DATABASE RENAME FILE "?/oradata/trgt/redo01.log" TO "/tmp/redo01.log"; ALTER DATABASE RENAME FILE "?/oradata/trgt/redo01.log" TO "/tmp/redo02.log";
Open the database using the
ALTER DATABASE OPEN RESETLOGS;
Note:All updates executed from the endpoint of the incomplete recovery to the present must be re-executed.
The current online redo log
An active online redo log
An unarchived online redo log
An inactive online redo log
If the database is operating in
ARCHIVELOG mode, and if the only copy of an archived redo log file is damaged, then the damaged file does not affect the present operation of the database. The following situations can arise, however, depending on when the redo log was written and when you backed up the datafile.
|If you backed up . . .||Then . . .|
|All datafiles after the filled online redo log group (which is now archived) was written||The archived version of the filled online redo log group is not required for complete media recovery operation.|
|A specific datafile before the filled online redo log group was written||If the corresponding datafile is damaged by a permanent media failure, use the most recent backup of the damaged datafile and perform incomplete recovery of the tablespace containing the damaged datafile, up to the damaged log.|
If you know that an archived redo log group has been damaged, immediately back up all datafiles so that you will have a whole database backup that does not require the damaged archived redo log.
One not-uncommon error is the accidental dropping of a table from your database. In general, the fastest and simplest solution is to use the flashback drop feature, described inOracle Database Backup and Recovery Basics, to reverse the dropping of the table. However, if for some reason, such as flashback drop being disabled or the table having been dropped with the PURGE option, you cannot use flashback table, you can create a copy of the database, perform point-in-time recovery of that copy to a time before the table was dropped, export the dropped table using an Oracle export utility, and re-import it into your primary database using an Oracle import utility.
In this scenario, assume that you do not have the flashback database functionality enabled, so
DATABASE is not an option, but you do have physical backups of the database.
Note:Grant powerful privileges (such as
TABLE) only to only selected, appropriate users, to minimize user errors that require database recovery.
If possible, keep the database that experienced the user error online and available for use. Back up all datafiles of the existing database in case an error is made during the remaining steps of this procedure.
Restore a database backup to an alternate location, then perform incomplete recovery of this backup using a restored backup control file, to the point just before the table was dropped.
Export the lost data from the temporary, restored version of the database using an Oracle export utility. In this case, export the accidentally dropped table.
Note:System audit options are exported.
Use an Oracle import utility to import the data back into the production database.
Delete the files of the temporary copy of the database to conserve space.
See Also:Oracle Database Utilities for more information about the Oracle export and import utilities
How you perform media recovery depends on whether your database participates in a distributed database system. The Oracle distributed database architecture is autonomous. Therefore, depending on the type of recovery operation selected for a single damaged database, you may have to coordinate recovery operations globally among all databases in the distributed system.
Table 19-2, "Recovery Operations in a Distributed Database Environment" summarizes different types of recovery operations and whether coordination among nodes of a distributed database system is required.
|If you are . . .||Then . . .|
Restoring a whole backup for a database that was never accessed from a remote node
Use non-coordinated, autonomous database recovery.
Restoring a whole backup for a database that was accessed by a remote node for a database in
Shut down all databases and restore them using the same coordinated full backup.
Performing complete media recovery of one or more databases in a distributed database
Use non-coordinated, autonomous database recovery.
Performing incomplete media recovery of a database that was never accessed by a remote node
Use non-coordinated, autonomous database recovery.
Performing incomplete media recovery of a database that was accessed by a remote node
Use coordinated, incomplete recovery to the same global point in time for all databases in the distributed system.
If one node in a distributed database requires recovery to a past time, it is often necessary to recover all other nodes in the system to the same point in time to preserve global data consistency. This operation is called coordinated, time-based, distributed database recovery. The following tasks should be performed with the standard procedures of time-based and change-based recovery described in this chapter.
Recover the database that requires the recovery operation using time-based recovery. For example, if a database needs to be recovered because of a media failure, then recover this database first using time-based recovery. Do not recover the other databases at this point.
If the message is, "
RESETLOGS after complete recovery through change xxx", then you have applied all the changes in the database and performed complete recovery. Do not recover any of the other databases in the distributed system, or you will unnecessarily remove changes in them. Recovery is complete.
If the message is, "
RESETLOGS after incomplete recovery UNTIL CHANGE xxx", then you have successfully performed an incomplete recovery. Record the change number from the message and proceed to the next step.
You may need to remove a database, that is, the database files that form the database, from the operating system. For example, this scenario can occur when you create a test database and then no longer have a use for it. The SQL*Plus command
DATABASE can perform this function.
See Also:Oracle Database Backup and Recovery Basics to learn how to use the equivalent RMAN command
Start SQL*Plus and connect to the target database with administrator privileges, then ensure that the database is either mounted or open with no users connected. For example:
SQL> STARTUP FORCE MOUNT
Remove the datafiles and control files listed in the control file from the operating system. For example:
SQL> DROP DATABASE; # deletes all database files, both ASM and non-ASM
If the database is on raw disk, the command does not delete the actual raw disk special files.
Use an operating system utility to delete all backups and archived logs associated with the database because these are not automatically deleted by the SQL*Plus command. For example:
% rm /backup/* ?/oradata/trgt/arch/*