20 Recovery Strategies and Procedures

This chapter describes Oracle Application Server recovery strategies and procedures for different types of failures and outages.

It contains the following topics:

Recovery Strategies
Recovery Procedures

20.1 Recovery Strategies

This section describes Oracle Application Server recovery strategies for different types of failures and outages. It contains the following topics:

Recovery Strategies for Data Loss, Host Failure, or Media Failure (Critical)
Recovery Strategies for Process Failures and System Outages (Non-Critical)

20.1.1 Recovery Strategies for Data Loss, Host Failure, or Media Failure (Critical)

This section describes recovery strategies for outages that involve actual data loss or corruption, host failure, or media failure where the host or disk cannot be restarted and are permanently lost. This type of failure requires some type of data restoration before the Oracle Application Server environment (middle tier, Infrastructure, or both) can be restarted and continue with normal processing.

The strategies in this section use point-in-time recovery of the middle tier and Infrastructure. This means that, no matter where the loss occurred, the Infrastructure and the middle tier are always restored together so they are in sync as they were at the time of the last backup. Notice that in an Oracle Application Server environment recovery, the Infrastructure is always restored before the middle tier.

Assumptions

The following assumptions apply to the recovery strategies in this section:

ARCHIVELOG mode was enabled for all Metadata Repository backups.
Complete recovery of the database can be performed, that is, no redo log files have been lost.
No administrative changes were made since the last backup. If administrative changes were made since the last backup, they will need to be reapplied after recovery is complete.

See Also:
Appendix G, "Examples of Administrative Changes" to learn more about administrative changes

Determining Which Strategy to Use

Recovery strategies are listed in Table 20-1

Use this table if you experience data loss, host failure, or media failure in an Infrastructure installation. Find the type of loss and follow the recommended procedure. The procedures apply to Infrastructures that are installed into a single Oracle home, as well as Infrastructures with Identity Management in one Oracle home and a Metadata Repository in another Oracle home or host.

If the loss occurred in both the Infrastructure and middle tier, follow the Infrastructure recovery strategy first, then the middle tier.

Table 20-1 Recovery Strategies for Data Loss, Host Failure, and Media Failure in Infrastructures

Type of Loss	Recovery Strategies
Loss of host	You can restore to a new host that has the same hostname. Follow the procedure in Section 20.2.3, "Restoring an Infrastructure to a New Host".
Oracle software/binary loss or corruption	If any Oracle binaries have been lost or corrupted, you must recover the entire Infrastructure. Follow the procedure in Section 20.2.2, "Restoring an Infrastructure to the Same Host".
Database or data failure of the Metadata Repository (datafile loss, control file loss, media failure, disk corruption)	If the Metadata Repository is corrupted due to data loss or media failure, you can restore and recover it. Follow the procedure in Section 20.2.5, "Restoring and Recovering the Metadata Repository".
Deletion or corruption of configuration files	If you lose any configuration files in the Infrastructure Oracle home, you can restore them. Follow the procedure in Section 20.2.6, "Restoring Infrastructure Configuration Files".
Deletion or corruption of configuration files and data failure of the Metadata Repository	If you lose configuration files and the Metadata Repository is corrupted, you can restore and recover both. Follow these procedures: Section 20.2.6, "Restoring Infrastructure Configuration Files" Section 20.2.5, "Restoring and Recovering the Metadata Repository"

20.1.2 Recovery Strategies for Process Failures and System Outages (Non-Critical)

This section describes recovery strategies for process failures and system outages. These types of outages do not involve any data loss, and therefore do not require any files to be recovered. In some cases, failure may be transparent and no manual intervention is required to recover the failed component. However, in some cases, manual intervention is required to restart a process or component. While these strategies do not strictly fit into the category of backup and recovery, they are included in this book for completeness.

Determining Which Strategy to Use

Recovery strategies for process failures and system outages are listed in Table 20-2.

Use this table if you experience a failure or outage in an Infrastructure. Find the type of outage and follow the recommended procedure. The procedures apply to Infrastructures that are installed into a single Oracle home, as well as Infrastructures with Identity Management in one Oracle home and a Metadata Repository in another Oracle home or host.

Table 20-2 Recovery Strategies for Process Failures and System Outages in Infrastructures

Type of Outage	How to Check Status and Restart
Host failure— no data loss	To restart: Restart the host. Start the Infrastructure. Refer to Section 3.2.1.
Metadata Repository instance failure (loss of the contents of a buffer cache or data residing in memory)	To check status: Try connecting to the database using SQL*Plus. Check the state as follows: SQL> select status from v$instance; To restart: sqlplus /nolog SQL> connect sys/password as sysdba SQL> startup SQL> quit
Metadata Repository listener failure	To check status: lsnrctl status To restart: lsnrctl start
Oracle Internet Directory server process (`oidldapd`) failure	To check status: ldapcheck To restart: opmnctl startproc ias-component=OID
Oracle Internet Directory monitor process (`oidmon`) failure	To check status: ldapcheck To restart: opmnctl startproc ias-component=OID
Application Server Control Console failure	To check status: emctl status iasconsole To restart: emctl start iasconsole
Oracle HTTP Server process failure	To check status: opmnctl status To restart: opmnctl startproc ias-component=HTTP_Server
OC4J instance failure	To check status: opmnctl status To restart: opmnctl startproc process-type=OC4J_instance_name
Delegated Administration Service instance failure	To check status: opmnctl status To restart: opmnctl startproc ias-component=OC4J process-type=OC4J_SECURITY
OPMN daemon failure	To check status: opmnctl status To restart: opmnctl start

20.2 Recovery Procedures

This section contains the procedures for performing different types of recovery.

It contains the following topics:

Using Application Server Control Console to Recover an Oracle Application Server Instance
Restoring an Infrastructure to the Same Host
Restoring an Infrastructure to a New Host
Restoring an Identity Management Instance to a New Host
Restoring and Recovering the Metadata Repository
Restoring Infrastructure Configuration Files
Restoring an Oracle Application Server Instance

20.2.1 Using Application Server Control Console to Recover an Oracle Application Server Instance

You can use the Oracle Enterprise Manager 10g Application Server Control Console to manage backup and recovery of an Oracle Application Server instance. Use the following procedure to recover an Oracle Application Server instance:

Before performing a restore operation (restore_instance or restore_config) on an instance in a cluster, all OC4J processes across the cluster must be stopped. Use the following command to stop the processes:

ORACLE_HOME/opmn/bin/opmnctl @cluster stopproc ias-component=OC4J

Some OC4J components (such as Wireless) do not have ias-component=OC4J. For these components use the uniqueid value to stop the OC4J process. To determine which components have a uniqueid, use the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders

The following is an example of the output from the command:

CUSTOM | N/A | DSA

LOGLDR | N/A | logloaderd

DCMDaemon | 1444413512 | dcm-daemon

WebCache | 1500577871 | WebCache

WebCache-admin | 1500577872 | WebCacheAdmin

OHS | 1500577870 | HTTP_Server

performance | 1500577873 | performance_server

messaging | 1500577874 | messaging_server

OC4J | 1500577865 | OC4J_Wireless

Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865


opmnctl: stopping opmn managed processes...

From the Home page for an application server instance, click Backup/Recovery to display the Backup/Recovery page.
Click Perform Recovery. The Infrastructure recovery screen displays:

Description of the illustration asadm049.gif
For the Infrastructure recovery screen, you can click the Recover Control Files check box to recover the control files for the instance. Click OK to perform the restore.

After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:

ORACLE_HOME/opmn/bin/opmnctl @cluster startproc ias-component=OC4J

For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:

opmnctl startall

20.2.2 Restoring an Infrastructure to the Same Host

This section describes how to restore an Infrastructure to the same host. You can use this procedure when you have lost some or all of your Oracle binaries.

Refer to Section 19.3.5, "Recovering an Instance on the Same Host" to restore the image backup of the Infrastructure Oracle home from your complete Oracle Application Server environment backup.

Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this step on both Oracle homes.

Note:

If you receive a WWC-41439 error while trying to login to the Portal Home page, do one or all of the following:

Remove aliases from your Apache configuration.
Include the domain in the ServerName parameter.
Fix the Host in the IASInstance element and ListenPort in the WebCacheComponent element in iasconfig.xml and run ptlconfig -dad portal-site. The ptlconfig script and the iasconfig.xml file is normally located in the directory portal/conf under the OracleAS Portal and OracleAS Wireless middle-tier home.

20.2.3 Restoring an Infrastructure to a New Host

Refer to Section 19.3.3, "Restoring a Node on a New Host" to perform the following types of restores:

Restore an Infrastructure to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.
Restore an Infrastructure to a new host that has the same hostname as the original host.

Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform the procedures on both Oracle homes as described in Section 20.2.4, "Restoring an Identity Management Instance to a New Host" and Section 20.2.5.2, "Restoring and Recovering the Metadata Repository to a New Host".

20.2.4 Restoring an Identity Management Instance to a New Host

Refer to Section 19.3, "Recovering a Loss of Host Automatically" to perform the following types of restores:

Restore Identity Management to the same host after the operating system has been reinstalled. The hostname must remain the same on the host.
Restore Identity Management to a new host that has the same or different hostname as the original host.

20.2.5 Restoring and Recovering the Metadata Repository

The section describes how to restore and recover the Metadata Repository. You can use this when there has been corruption only to the Metadata Repository, and not to any other files in the Oracle home.

Restore and recover the Metadata Repository from your latest backup using your own procedure or the OracleAS Backup and Recovery Tool. Restart all Infrastructure processes after restoring a Metadata Repository.

The following sections describe Oracle recommended procedures for using the OracleAS Backup and Recovery Tool to restore and recover the Metadata Repository:

Restoring and Recovering the Metadata Repository to the Same Host
Restoring and Recovering the Metadata Repository to a New Host

20.2.5.1 Restoring and Recovering the Metadata Repository to the Same Host

This section covers several circumstances under which you may need to restore and recover the Metadata Repository to the same host:

Corrupted or Lost Datafile
Corrupted or Lost Control File
Point-in-Time Recovery and Flashback Recovery

Corrupted or Lost Datafile

If a datafile is corrupted or lost, you can use the following command to restore from the latest backup and perform a full recovery:

For UNIX:

bkp_restore.sh -m restore_repos

For Windows:

bkp_restore.bat -m restore_repos

Corrupted or Lost Control File

If a control file is corrupted or lost, you can use the following command to restore a control file backup, restore the datafiles, and perform a full recovery:

For UNIX:

bkp_restore.sh -m restore_repos -c

For Windows:

bkp_restore.bat -m restore_repos -c

When you use the -c option, it restores the control file. This causes entries for tempfiles in locally-managed temporary tablespaces to be removed. You must add a new tempfile to the TEMP tablespace, or Oracle will display error ORA-25153: Temporary Tablespace is Empty.

To add a tempfile to the TEMP tablespace:

SQL> alter tablespace "TEMP" add tempfile 'ORACLE_HOME/oradata/GDB/

temp01.dbf' size 5120K autoextend on next 8k maxsize unlimited;

GDB is the first part of the global database name.

Note that when you restore a control file, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

Point-in-Time Recovery and Flashback Recovery

If you lost configuration files in your middle-tier or Infrastructure installation and restored those, you may want to restore or flashback the database to the same point-in-time as the configuration file backup. You can do this using one of the following commands:

For UNIX:

bkp_restore.sh -m restore_repos -u timestamp


bkp_restore.sh flashback_repos -u timestamp

For Windows:

bkp_restore.bat -m restore_repos -u timestamp


bkp_restore.bat flashback_repos -u timestamp

Flashback recovery to a point-in-time can undo any logical data corruption or user error. Flashback cannot undo physical data corruption due to media failure. Using the restore_repos command, you can recover and restore the database to a point-in-time for both logical and physical data corruption. However, Flashback is faster at recovering logical data corruption because it does not require restoring backups.

You can specify any time between the time of your first backup and the current time, as long as none of the online redo logs were compromised. If any online redo logs are missing or corrupted, the latest time that can be specified is the time at which the last backup was made.

Note that when you do point-in-time recovery, the tool performs an "alter database open resetlogs." This invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

The Backup and Recovery Tool supports point-in-time recovery through resetlogs in all Oracle databases: Infrastructure with Identity Management and Metadata Repository, RepCA, and generic Oracle databases (for example, OCS Infostore). The following is an example of a point-in-time recovery through resetlogs:

At time T1, a backup of the database is taken. Changes are made to the database. At time T2, a new backup is taken. More changes are made to the database. At time T3, another backup is taken. More changes are made. At time T4, the user restores and recovers the database to T3. Since this is a point-in-time recovery, the Backup and Recovery Tool opens the database with resetlogs to start a new log sequence after the recovery. At time T5, the user restores and recovers the database to T2 through the resetlogs created at T4.

Multiple backward point-in-time recoveries are supported for backups taken using backup_instance_cold, backup_instance_online, and backup_instance_incr. To perform multiple backward point-in-time recoveries using backup_cold, backup_online, and backup_incr, you must follow the backup operation immediately with backup_config.

20.2.5.2 Restoring and Recovering the Metadata Repository to a New Host

When you restore the Metadata Repository to a new host (with the same hostname), the new host will not have the online redo logs that existed on the original host. Therefore, you cannot perform a full recovery; RMAN would give an error stating that it cannot find a certain log file (the online redo log file). Instead, you should do a point-in-time recovery using a time sometime between the first and most recent backup. You can do this by specifying the proper timestamp for the LOHA reconfigure operation. Use the procedure at Section 19.3.3, "Restoring a Node on a New Host" to restore the Metadata Repository.

During the LOHA reconfigure process, if the RMAN command returns an error and the log shows that the datafiles were restored and recovered, then LOHA will issue an "alter database open resetlogs" and the database will be opened in a consistent state. If no datafiles were restored and recovered, it is most likely that an early timestamp was specified. You should retry the command with a later timestamp.

LOHA uses the -c option during the restore process which means that the control file is restored from backup. This causes entries for tempfiles in locally managed temporary tablespaces to be removed and a new TEMP tablespace to be added automatically. Restoring the control file means that an "alter database open resetlogs" is always performed, which invalidates all backups and archivelogs. You should immediately perform a complete cold backup of the Metadata Repository, which will serve as the new baseline for your subsequent partial online backups.

20.2.6 Restoring Infrastructure Configuration Files

This section describes how to restore the configuration files in an Infrastructure Oracle home. You can use this procedure when configuration files have been lost or corrupted.

It contains the following tasks:

Task 1: Stop the Infrastructure
Task 2: Restore Infrastructure Configuration Files
Task 3: Apply Recent Administrative Changes
Task 4: Start the Infrastructure

Task 1: Stop the Infrastructure

Refer to Section 3.2.2 for instructions.

Task 2: Restore Infrastructure Configuration Files

Note:

If your Infrastructure is split and has Identity Management in one Oracle home, and the Metadata Repository in another Oracle home, perform this task on both Oracle homes.

Restore all configuration files from your most recent backup. You can perform this task using your own procedure or the OracleAS Backup and Recovery Tool. For example, to do this using the tool:

On UNIX systems:

bkp_restore.sh -m restore_config -t timestamp

On Windows systems:

bkp_restore.bat -m restore_config -t timestamp

20.2.7 Restoring a File-Based Repository to a New Host

This section describes how to restore a DCM file-based repository to a new host. This section contains the following tasks:

Task 1: Restore Image Backup, System Files, and Instance Reconfiguration
Task 2: Inform the Original Host That It Is No Longer a Repository Host (If Required)

Task 1: Restore Image Backup, System Files, and Instance Reconfiguration

If the DCM repository is a database, start the OPMN and Oracle Internet Directory processes on the corresponding infrastructure instance.

Use the following command to start the OPMN process:
```
opmnctl start
```
Use the following command to start the Oracle Internet Directory process:
```
opmnctl startproc ias-component=OID
```
Use the following command to check if the DCM repository is a database or a file-based repository:
```
ORACLE_HOME/dcm/bin/dcmctl whichfarm
```
The preceding command returns one of the following messages:
```
Repository Type: Database => uses a database repository

Repository Type: Distributed File Based => uses a file based repository
```

Perform the steps in Section 19.3.3, "Restoring a Node on a New Host" to restore the image backup, system files, and instance reconfiguration.

Task 2: Inform the Original Host That It Is No Longer a Repository Host (If Required)

Now that the file-based repository is restored to the new host, the original host may need to be informed that it is no longer a repository host. If the new host was already a part of the farm and is not a replacement for the original host, and the original host is still part of the farm, execute the following command on the original host:

dcmctl repositoryrelocated

20.2.8 Restoring an Oracle Application Server Instance

Use the following command to restore an Oracle Application Server instance to a particular point in time:

bkp_restore.sh -m restore_instance -t 2004-09-21_06-12-45 -c


bkp_restore.bat -m restore_instance -t 2004-09-21_06-12-45 -c

ORACLE_HOME/opmn/bin/opmnctl @cluster stopproc ias-component=OC4J

ORACLE_HOME\opmn\bin\opmnctl @cluster status -fmt %typ%uid%prt -noheaders

The following is an example of the output from the command:

CUSTOM         | N/A        | DSA

LOGLDR         | N/A        | logloaderd

DCMDaemon      | 1444413512 | dcm-daemon

WebCache       | 1500577871 | WebCache

WebCache-admin | 1500577872 | WebCacheAdmin

OHS            | 1500577870 | HTTP_Server

performance    | 1500577873 | performance_server

messaging      | 1500577874 | messaging_server

OC4J           | 1500577865 | OC4J_Wireless

Stop all the OC4J processes, for which the second column (uid) value is not "N/A", with the following command:

ORACLE_HOME\opmn\bin\opmnctl @cluster stopproc uniqueid=1500577865


opmnctl: stopping opmn managed processes...

After the restore operation is complete, use the following command to restart the OC4J processes across the cluster:

ORACLE_HOME/opmn/bin/opmnctl @cluster startproc ias-component=OC4J

For components that use uniqueid, you can restart their process by using the appropriate ias-component value or by using the following command:

opmnctl startall