Skip Headers
Oracle® Enterprise Manager Cloud Control Administrator's Guide
12c Release 1 (12.1.0.1)

Part Number E24473-09
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

23 Enterprise Manager Outages

Outages can be planned as might be the case when performing upgrades or periodic maintenance, or unplanned as can happen in the event of hardware/software failure, or perhaps some environmental catastrophe. Regardless of the type of outage, you want to ensure that your IT infrastructure can be restored an running as soon as possible.

This chapter covers the following:

Recovery of Failed Enterprise Manager Components

Switching or Failing over to Standby Enterprise Manager Configurations

Recovery of Failed Enterprise Manager Components

Recovering Enterprise Manager means restoring any of the three fundamental components of the Enterprise Manager architecture.

Repository Recovery

Recovery of the Repository database must be performed using RMAN since Cloud Control will not be available when the repository database is down. There are two recovery cases to consider:

  • Full Recovery: No special consideration is required for Enterprise Manager.

  • Point-in-Time/Incomplete Recovery: Recovered repository may be out of sync with Agents because of lost transactions. In this situation, some metrics may show up incorrectly in the Cloud Control console unless the repository is synchronized with the latest state available on the Agents.

A repository resync feature allows you to automate the process of synchronizing the Enterprise Manager repository with the latest state available on the Management Agents.

To resynchronize the repository with the Management Agents, you use Enterprise Manager command-line utility (emctl) resync repos command:

emctl resync repos -full -name "<descriptive name for the operation>"

You must run this command from the OMS Oracle Home AFTER restoring the Management Repository, but BEFORE starting the OMS. After submitting the command, start up all OMS instances and monitor the progress of repository resychronization from the Enterprise Manager console's Repository Resynchronization page, as shown in the following figure.

Figure 23-1 Repository Synchronization Page

Description of Figure 23-1 follows
Description of "Figure 23-1 Repository Synchronization Page"

Management Repository recovery is complete when the resynchronization jobs complete on all Management Agents.

Oracle strongly recommends that the Management Repository database be run in archivelog mode so that in case of failure, the database can be recovered to the latest transaction. If the database cannot be recovered to the last transaction, Repository Synchronization can be used to restore monitoring capabilities for targets that existed when the last backup was taken. Actions taken after the backup will not be recovered automatically. Some examples of actions that will not be recovered automatically by Repository Synchronization are:

  • Incident Rules

  • Preferred Credentials

  • Groups, Services, Systems

  • Jobs/Deployment Procedures

  • Custom Reports

  • New Agents

Recovery Scenarios

A prerequisite for repository (or any database) recovery is to have a valid, consistent backup of the repository. Using Enterprise Manager to automate the backup process ensures regular, up-to-date backups are always available if repository recovery is ever required. Recovery Manager (RMAN) is a utility that backs up, restores, and recovers Oracle Databases. The RMAN recovery job syntax should be saved to a safe location. This allows you to perform a complete recovery of the Enterprise Manager repository database. In its simplest form, the syntax appears as follows:

run {

restore database;

recover database;

}

Actual syntax will vary in length and complexity depending on your environment. For more information on extracting syntax from an RMAN backup and recovery job, or using RMAN in general, see the Oracle Database Backup and Recovery Advanced User's Guide.

The following scenarios illustrate various repository recovery situations along with the recovery steps.

Full Recovery on the Same Host

Repository database is running in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop all OMS instances using emctl stop oms.

  2. Recover the database using RMAN

  3. Bring the site up using the command emctl start oms on all OMS instances.

  4. Verify that the site is fully operational.

Incomplete Recovery on the Same Host

Repository database is running in noarchivelog mode. Full offline backup is available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using emctl stop oms.

  2. Recover the database using RMAN.

  3. Initiate Repository Resync using emctl resync repos -full -name "<resync name>" from one of the OMS Oracle Home.

  4. Start the OMS( instances using emctl start oms.

  5. Log into Cloud Control. Navigate to the Management Services and Repository Overview page. Click Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error.

  6. Verify that the site is fully operational.

Full Recovery on a Different Host

The Management Repository database is running on host "A" in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using the command emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository by running the following command on each OMS.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Start the OMS instances using the command emctl start oms.

  5. Relocate the Management Repository database target to the Agent running on host "B" by running the following command from the OMS:

    $emctl config repos -host <hostB> -oh <OH of repository on hostB>  -conn_desc "<TNS connect descriptor>"
    

    Note:

    This command can only be used to relocate the repository database under the following conditions:
    • An Agent is already running on this machine.

    • No database on host "B" has been discovered.

  6. Change the monitoring configuration for the OMS and Repository target: by running the following command from the OMS:

    $emctl config emrep -conn_desc "<TNS connect descriptor>"
    
  7. Verify that the site is fully operational.

Incomplete Recovery on a Different Host

The Management Repository database is running on host "A" in noarchivelog mode. Full offline backup is available. Host "A" is lost due to hardware failure. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Initiate Repository Resync:

    $emctl resync repos -full -name "<resync name>"

    from one of the OMS Oracle Homes.

  5. Start the OMS using the command emctl start oms.

  6. Run the command to relocate the repository database target to the Management Agent running on host "B":

    $emctl config repos -agent <agent on host B> -host <hostB> -oh <OH of repository on hostB> -conn_desc "<TNS connect descriptor>"

  7. Run the command to change monitoring configuration for the OMS and Repository target:

    emctl config emrep -conn_desc "<TNS connect descriptor>"

  8. Log in to Cloud Control. Navigate to Management Services and Repository Overview page.

  9. Choose on Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error mentioned.

  10. Verify that the site is fully operational.

Recovering the OMS

If an Oracle Management Service instance is lost, recovering it essentially consists of two steps: Recovering the Software Homes, then configuring the Instance Home.

When restoring on the same host, the software homes can be restored from filesystem backup. In case a backup does not exist, or if installing to a different host, the Software Homes can be reconstructed using the “Install Software Only" option from the Cloud Control software distribution. Care should be taken to select and install all Management Plug-ins that existed in your environment prior to crash.

The following SQL command can be run against the repository database as the “sysman” user to obtain the list of plug-ins already deployed:

SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id;

Note that some plug-ins might have not shipped with Cloud Control and might not be present in the install media. Such plug-ins should be downloaded from OTN and their location passed to the Oracle Installer. Choose all plug-ins returned by the SQL query above in the plug-ins page of the Installer. Recovery will fail if all the required plug-ins are not selected.

After running the installer in software only mode, all patches that were installed prior to the crash must be re-applied. Assuming the Management Repository is intact, the post scripts that run SQLs against the repository can be skipped as the repository already has those patches applied.

As stated earlier, the location of the OMS Oracle Home is fixed and cannot be changed. Hence, ensure that the OMS Oracle Home is restored in the same location that was used previously.

Once the Software Homes are recovered, the instance home can be reconstructed using the omsca command in recovery mode:

omsca recover –as –ms -nostart –backup_file <exportconfig file>

Use the export file generated by the emctl exportconfig command shown in the previous section.

OMS Recovery Scenarios

The following scenarios illustrate various OMS recovery situations along with the recovery steps.

Important:

A prerequisite for OMS recovery is to have recent, valid OMS configuration backups available. Oracle recommends that you back up the OMS using the emctl exportconfig oms command whenever an OMS configuration change is made. This command must be run on the primary OMS running the WebLogic AdminServer.

Alternatively, you can run this command on a regular basis using the Enterprise Manager Job system.

Each of the following scenarios cover the recovery of the Software homes using either a filesystem backup (when available and only when recovering to the same host) or using the Software only option from the installer. In either case, the best practice is to recover the instance home (gc_inst) using the omsca recover command, rather than from a filesystem backup. This guarantees that the instance home is valid and up to date.

Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host

Site hosts a single OMS. No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command on the primary OMS running the AdminServer. The OMS Oracle Home is lost.

Resolution:

  1. Perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    Note:

    Change Middleware|gc_inst to strings that match your own middleware and instance homes.

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that software library locations are still accessible.

  3. Restore the Software Homes.

    If restoring from a filesystem backup, delete the file OMS_HOME/sysman/config/emInstanceMapping.properties and any gc_inst directory that may have been restored, if they exist.

    Alternatively, if a backup does not exist, use the software only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plug-ins were deployed previously by running the following SQL against the repository database:

      SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Start the OMS.

    OMS_HOME/bin/emctl start oms
    
  6. Recover the Agent (if necessary).

    If the Management Agent Software Home was recovered along with the OMS Software Homes (as is likely in a single OMS install recovery where the Management Agent and agent_inst directories are commonly under the Middleware home), the Management Agent instance directory should be recreated to ensure consistency between the Management Agent and OMS.

    1. Remove the agent_inst directory if it was restored from backup

    2. Use agentDeploy.sh to configure the agent:

      <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
      
    3. The OMS automatically blocks the Management Agent. Resync the Management Agent from the Management Agent homepage.

    If the Management Agent software home was not recovered along with the OMS but the Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Agent software home under the Middleware home.
  7. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost.

Resolution:

  1. Ensure that software library locations are accessible from “Host B”.

  2. Restore the software homes on “Host B”.

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plug-ins were deployed previously by running the following SQL against the repository database:

      SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  4. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

  5. Configure the Agent.

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Management Agent. Resync the Management Agent from the Management Agent homepage

  6. Relocate the oracle_emrep target to the Management Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli login –username=sysman
    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>
    
  7. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  8. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  9. Change the OMS to which all Management Agents point and then resecure all Agents.

    Because the new machine is using a different hostname from the one originally hosting the OMS, all Agents in your monitored environment must be told where to find the new OMS. On each Management Agent, run the following command:

    <AGENT_INST_DIR>/bin/emctl secure agent -emdWalletSrcUrl "http://hostB:<http_port>/em"
    
  10. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking 'Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  11. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host using the Original Hostname

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost. Recovery is to be performed on “Host B” but retaining the use of “Hostname A”.

Resolution:

  1. Ensure that loader receive directory and software library locations are accessible from Host "B".

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plug-ins were deployed previously by running the following SQL against the Management Repository database:

      SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id; 
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  2. Modify the network configuration such that “Host B” also responds to hostname of “Host A”. Specific instructions on how to configure this are beyond the scope of this document. However, some general configuration suggestions are:

    Modify your DNS server such that both “Hostname B” and “Hostname A” network addresses resolve to the physical IP of “Host B”.

    Multi-home “Host B”. Configure an additional IP on “Host B” for the IP address that “Hostname A” resolves to. For example, on “Host B” run the following commands:

    ifconfig eth0:1 <IP assigned to “Hostname A”> netmask <netmask>
    /sbin/arping -q -U -c 3 -I eth0 <IP of HostA>
    
  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file> -AS_HOST <hostA> -EM_INSTANCE_HOST <hostA>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  4. Start the OMS

    <OMS_HOME>/bin/emctl start oms
    
  5. Configure the agent.

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    
  6. The OMS automatically blocks the Management Agent. Resync the Management Agent from the Management Agent homepage.

    Run the command to relocate Management Services and Management Repository target to Management Agent "B":

    emctl config emrep -agent <agent on host B>
    
  7. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  8. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Management Agent. Relocate duplicate targets from Management Agent "A" to Management Agent "B".

  9. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer, Primary OMS Recovered on the Same Host

Site hosts multiple OMS instances. All OMS instances are fronted by a Server Load Balancer. OMS configuration backed up using the emctl exportconfig oms command on the primary OMS running the WebLogic AdminServer. The primary OMS is lost.

Resolution:

  1. Perform a cleanup on the failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    Note:

    Change Middleware|gc_inst to strings that match your own middleware and instance homes.

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that software library locations are still accessible.

  3. Restore the software homes.

    If restoring from a filesystem backup, delete the file <OMS_HOME>/sysman/config/emInstanceMapping.properties and any gc_inst directory that may have been restored, if they exist. Alternatively, if a backup does not exist, use the software only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the Management Repository database:

      SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  6. Recover the Management Agent.

    If the Management Agent software home was recovered along with the OMS software homes (as is likely in a Primary OMS install recovery where the agent and agent_inst directories are commonly under the Middleware home), the Management Agent instance directory should be recreated to ensure consistency between the Management Agent and OMS.

    1. Remove the agent_inst directory if it was restored from backup.

    2. Use agentDeploy.sh to configure the Management Agent:

      <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
      
    3. The OMS automatically blocks the Management Agent. Resync the Management Agent from the Management Agent homepage.

    If the Management Agent software home was not recovered along with the OMS but the Management Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Management Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Management Agent software home under the Middleware home.
  7. Re-enroll the additional OMS, if any, with the recovered Administration Server by running emctl enroll oms on each additional OMS.

  8. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host

Site hosts multiple OMS instances. OMS instances fronted by a Server Load Balancer. OMS Configuration backed up using emctl exportconfig oms command. Primary OMS on host "A" is lost and needs to be recovered on Host "B".

  1. If necessary, perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    
  2. Ensure that software library locations are accessible from “Host B”.

  3. Restore the software homes on “Host B”.

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the Management Repository database:

      SELECT epv.display_name, epv.plugin_id, epv.version FROM em_plugin_version epv, em_current_deployed_plugin ecp WHERE epv.plugin_type NOT IN ( 'BUILT_IN_TARGET_TYPE' , 'INSTALL_HOME') AND ecp.dest_type='2' AND epv.plugin_version_id = ecp.plugin_version_id;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  6. Configure the Management Agent.

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Management Agent. Resync the Management Agent from the Management Agent homepage

  7. Add the new OMS to the SLB virtual server pools and remove the old OMS.

  8. Relocate the oracle_emrep target to the Management Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>
    
  9. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  10. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Management Agent. Relocate duplicate targets from Management Agent "A" to Management Agent "B".

  11. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking 'Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  12. Verify that the site is fully operational.

Multiple OMS, SLB configured, additional OMS recovered on same or different host

Multiple OMS site where the OMS instances are fronted by an SLB. OMS configuration backed up using the emctl exportconfig oms command on the first OMS. Additional OMS is lost and needs to be recovered on the same or a different host.

  1. If recovering to the same host, ensure cleanup of the failed OMS has been performed:

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    First de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that shared software library locations are accessible.

  3. Install an Management Agent on the required host (same or different as the case may be).

  4. Use the Additional OMS deployment procedure to configure a new additional OMS.

  5. Verify that the site is fully operational.

Recovering Management Agents

If an Management Agent is lost, it should be reinstalled by cloning from a reference install. Cloning from a reference install is often the fastest way to recover an Management Agent install as it is not necessary to track and reapply customizations and patches. Care should be taken to reinstall the Management Agent using the same port. Using the Enterprise Manager's Management Agent Resynchronization feature, a reinstalled Management Agent can be reconfigured using target information present in the Management Repository. When the Management Agent is reinstalled using the same port, the OMS detects that it has been re-installed and blocks it temporarily to prevent the auto-discovered targets in the re-installed Management Agent from overwriting previous customizations.

Blocked Management Agents:

A This is a condition in which the OMS rejects all heartbeat or upload requests from the blocked Management Agent. Hence, a blocked Agent will not be able to upload any alerts or metric data to the OMS. However, blocked Management Agents continue to collect monitoring data.

The Management Agent can be resynchronized and unblocked from the Management Agent homepage by clicking on the Resynchronize Agent button. Resynchronization pushes all targets from the Management Repository to the Management Agent and then unblocks the Agent.

Management Agent Recovery Scenarios

The following scenarios illustrate various Management Agent recovery situations along with the recovery steps. The Management Agent resynchronization feature requires that a reinstalled Management Agent use the same port and location as the previous Management Agent that crashed.

Management Agent Reinstall Using the Same Port

A Management Agent is monitoring multiple targets. The Agent installation is lost.

  1. De-install the Agent Oracle Home using the Oracle Universal Installer.

    Note:

    This step is necessary in order to clean up the inventory.
  2. Install a new Management Agent or use the Management Agent clone option to reinstall the Management Agent though Enterprise Manager. Specify the same port that was used by the crashed Agent. The location of the install must be same as the previous install.

    The OMS detects that the Management Agent has been re-installed and blocks the Management Agent.

  3. Initiate Management Agent Resynchronization from the Management Agent homepage.

    All targets in the Management Repository are pushed to the new Management Agent. The Agent is instructed to clear backlogged files and then do a clearstate. The Agent is then unblocked.

  4. Reconfigure User-defined Metrics if the location of User-defined Metric scripts have changed.

  5. Verify that the Management Agent is operational and all target configurations have been restored using the following emctl commands:

    emctl status agent 
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Management Agent Restore from Filesystem Backup

A single Management Agent is monitoring multiple targets. File system backup for the Agent Oracle Home exists. The Agent install is lost.

  1. Restore the Management Agent from the filesystem backup then start the Management Agent.

    The OMS detects that the Management Agent has been restored from backup and blocks the Management Agent.

  2. Initiate Management Agent Resynchronization from the Management Agent homepage.

    All targets in the Management Repository are pushed to the new Management Agent. The Agent is instructed to clear backlogged files and performs a clearstate. The Management Agent is unblocked.

  3. Verify that the Management Agent is functional and all target configurations have been restored using the following emctl commands:

    emctl status agent
    
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Recovering from a Simultaneous OMS-Management Repository Failure

When both OMS and Management Repository fail simultaneously, the recovery situation becomes more complex depending upon factors such as whether the OMS and Management Repository recovery has to be performed on the same or different host, or whether there are multiple OMS instances fronted by an SLB. In general, the order of recovery for this type of compound failure should be Management Repository first, followed by OMS instances following the steps outlined in the appropriate recovery scenarios discussed earlier. The following scenarios illustrate two OMS-Management Repository failures and the requisite recovery steps.

Collapsed Configuration: Incomplete Management Repository Recovery, Primary OMS on the Same Host

Management Repository and the primary OMS are installed on same host (host "A"). The Management Repository database is running in noarchivelog mode. Full cold backup is available. A recent OMS backup file exists ( emctl exportconfig oms). The Management Repository, OMS and the Management Agent crash.

  1. Follow the Management Repository recovery procedure shown in Incomplete Recovery on the Same Host with the following exception:

    Since the OMS OracleHome is not available and Management Repository resynchronization has to be initiated before starting an OMS against the restored Management Repository, submit "resync" via the following PL/SQL block. Log into the Management Repository as SYSMAN using SQLplus and run:

    begin emd_maintenance.full_repository_resync('<resync name>'); end;
    
  2. Follow the OMS recovery procedure shown in Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host.

  3. Verify that the site is fully operational.

Distributed Configuration: Incomplete Management Repository Recovery, Primary OMS and additional OMS on Different Hosts, SLB Configured

The Management Repository, primary OMS, and additional OMS all reside on the different hosts. The Management Repository database was running in noarchivelog mode. OMS backup file from a recent backup exists (emctl exportconfig oms). Full cold backup of the database exists. All three hosts are lost.

  1. Follow the Management Repository recovery procedure shown in Incomplete Recovery on the Same Host. with the following exception:

    Since OMS Oracle Home is not yet available and Management Repository resync has to be initiated before starting an OMS against the restored Management Repository, submit resync via the following PL/SQL block. Log into the Management Repository as SYSMAN using SQLplus and run the following:

    begin emd_maintenance.full_repository_resync('resync name'); end;
    
  2. Follow the OMS recovery procedure shown in Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host with the following exception:

    Override the Management Repository connect description present in the backup file by passing the additional omsca parameter:

    -REPOS_CONN_STR <restored repos descriptor>
    

    This needs to be added along with other parameters listed in Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host.

  3. Follow the OMS recovery procedure shown in Multiple OMS, SLB configured, additional OMS recovered on same or different host.

  4. Verify that the site is fully operational.

Switching Over or Failing Over to Standby Enterprise Manager Configurations

Switchover

Switchover is a planned activity where operations are transferred from the Primary site to a Standby site. This is usually done for testing and validation of Disaster Recovery (DR) scenarios and for planned maintenance activities on the primary infrastructure.

Switchover to the Passive OMS in a Level 2 Active/Passive Configuration

Switchover follows the same steps as Failover, See Section 'Failover to the Passive OMS in a Level 2 Active/Passive Configuration'

Switchover to the Standby Site in a Level 4 MAA Configuration

This section describes the steps to switchover to the standby site. The same procedure is applied to switchover in either direction.

Enterprise Manager Console cannot be used to perform switchover of the Management Repository database. Use the Data Guard Broker command line tool DGMGRL instead.

  1. Prepare the Standby Database

    Verify that recovery is up-to-date. Using the Enterprise Manager Console, you can view the value of the ApplyLag column for the standby database in the Standby Databases section of the Data Guard Overview Page.

  2. Shut down the Primary Enterprise Manager Application Tier.

    Shutdown all the Management Service instances in the primary site by running the following command on each Management Service:

    emctl stop oms -all

  3. Verify Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  4. Switch over to the Standby Database

    Use DGMGRL to perform a switchover to the standby database. The command can be run on the primary site or the standby site. The switchover command verifies the states of the primary database and the standby database, affects switchover of roles, restarts the old primary database, and sets it up as the new standby database.

    SWITCHOVER TO <standby database name>;

    Verify the post switchover states. To monitor a standby database completely, the user monitoring the database must have SYSDBA privileges. This privilege is required because the standby database is in a mounted-only state. A best practice is to ensure that the users monitoring the primary and standby databases have SYSDBA privileges for both databases.

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    
  5. Start the Admin Server if it is not already running.

    emctl start oms -admin_only
    
  6. Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman

  7. Startup the Enterprise Manager Application Tier

    Startup all the Management Services on the standby site:

    emctl start oms

  8. Relocate Management Services and Management Repository target

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

    emctl config emrep -agent <agent name> -conn_desc

  9. Switchover to Standby SLB.

    Make appropriate network changes to failover your primary SLB to standby SLB that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

  10. Establish the old primary Management Services as the new standby Management Services to complete the switchover process.

    Start the Administration Server on old primary site

    emctl start oms -admin_only

    Point the old primary Management Services to the new Primary Repository database by running the following command on each Management Service on the old primary site.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman

This completes the switchover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Repeat the same procedure to switchover in the other direction.

Failover

Failover to the Passive OMS in a Level 2 Active/Passive Configuration

  1. Establish the IP address on failover host.

  2. If the Database and Listener are also part of the same failover group:

    1. Start the TNS listener using the command lsnrctl start.

    2. Start the database using the command dbstart.

  3. Start Cloud Control using the command emctl start oms.

  4. Test the functionality.

Failover of the OMS in a Level 3 Active/Active Configuration

OMS failover is handled transparently in a Level 3, Active/Active OMS, configuration. The SLB monitors determine that the failed OMS is no longer available and route traffic to available OMS servers.

Manual Failover to the Standby Site in a Level 4 MAA Configuration

This section describes the steps to failover to a standby database, recover the Enterprise Manager application state by resynchronization the Management Repository database with all Management Agents, and enabling the original primary database as a standby using flashback database.

The word manual is used here to contrast this type of failover with a fast-start failover described later in Automatic Failover to the Standby Site in a Level 4 MAA Configuration.

  1. Verify Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  2. Failover to Standby Database.

    Shutdown the database on the primary site. Use DGMGRL to connect to the standby database and execute the FAILOVER command:

    FAILOVER TO <standby database name>;

    Verify the post failover states:

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    

    Note that after the failover completes, the original primary database cannot be used as a standby database of the new primary database unless it is re-enabled.

  3. Start the Admin Server if it is not already running.

    emctl start oms -admin_only
    
  4. Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
    
  5. Resync the New Primary Database with Management Agents.

    Skip this step if you are running in Data Guard Maximum Protection or Maximum Availability level as there is no data loss on failover. However, if there is data loss, synchronize the new primary database with all Management Agents.On any one Management Service on the standby site, run the following command:

    emctl resync repos -full -name "<name for recovery action>"

    This command submits a resync job that would be executed on each Management Agent when the Management Services on the standby site are brought up.

    Repository resynchronization is a resource intensive operation. A well tuned Management Repository will help significantly to complete the operation as quickly as possible. Specifically if you are not routinely coalescing the IOTs/indexes associated with Advanced Queueing tables as described in My Oracle Support note 271855.1, running the procedure before resync will significantly help the resync operation to complete faster.

  6. Start up the Enterprise Manager Application Tier

    Start up all the Management Services on the standby site by running the following command on each Management Service.

    emctl start oms

  7. Relocate Management Services and Management Repository target.

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the standby site Management Service.

    emctl config emrep -agent <agent name> -conn_desc

  8. Switchover to the Standby SLB.

    Make appropriate network changes to failover your primary SLB to the standby SLB, that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

  9. Establish Original Primary Database as Standby Database Using Flashback

    Once access to the failed site is restored and if you had flashback database enabled, you can reinstate the original primary database as a physical standby of the new primary database.

    1. Shut down all the Management Services in the original primary site.

      emctl stop oms -all

    2. Restart the original primary database in mount state:

      shutdown immediate;

      startup mount;

    3. Reinstate the Original Primary Database

      Use DGMGRL to connect to the old primary database and execute the REINSTATE command

      REINSTATE DATABASE <old primary database name>;

    4. The newly reinstated standby database will begin serving as standby database to the new primary database.

    5. Verify the post reinstate states.

      SHOW CONFIGURATION;

      SHOW DATABASE <primary database name>;

      SHOW DATABASE <standby database name>;

  10. Establish Original Primary Management Service as the standby Management Service.

    Start the Administration Server on old primary site

    emctl start oms -admin_only

    Point the old primary Management Service to the new Primary Repository database by running the following command on each Management Service on the old primary site.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
    
  11. Monitor and complete Repository Resynchronization

    Navigate to the Management Services and Repository Overview page of Cloud Control Console. Under Related Links, click Repository Synchronization. This page shows the progress of the resynchronization operation on a per Management Agent basis. Monitor the progress.

    Operations that fail should be resubmitted manually from this page after fixing the error mentioned. Typically, communication related errors are caused by Management Agents being down and can be fixed by resubmitting the operation from this page after restarting the Management Agent.

    For Management Agents that cannot be started due to some reason, for example, old decommissioned Management Agents, the operation should be stopped manually from this page. Resynchronization is deemed complete when all the jobs have a completed or stopped status.

This completes the failover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Perform a switchover procedure if the site operations have to be moved back to the original primary site.

Automatic Failover to the Standby Site in a Level 4 MAA Configuration

This section details the steps to achieve complete automation of failure detection and failover procedure by utilizing Fast-Start Failover and Observer process. At a high level the process works as follows:

  • Fast-Start Failover (FSFO) determines that a failover is necessary and initiates a failover to the standby database automatically

  • When the database failover has completed the DB_ROLE_CHANGE database event is fired

  • The event causes a trigger to be fired which calls a script that configures and starts Enterprise Manager Application Tier

Perform the following steps:

  1. Develop Enterprise Manager Application Tier Configuration and Startup Script

    Develop a script that will automate the Enterprise Manager Application configuration and startup process. See the sample shipped with Cloud Control in the OH/sysman/ha directory. A sample script for the standby site is included here and should be customized as needed. Make sure ssh equivalence is setup so that remote shell scripts can be executed without password prompts. Place the script in a location accessible from the standby database host. Place a similar script on the primary site.

    #!/bin/sh
    # Script: /scratch/EMSBY_start.sh
    # Primary Site Hosts
    # Repos: earth, OMS: jupiter1, jupiter2
    # Standby Site Hosts
    # Repos: mars, # OMS: saturn1, saturn2
    LOGFILE="/net/mars/em/failover/em_failover.log"
    OMS_ORACLE_HOME="/scratch/OracleHomes/em/oms11"
    CENTRAL_AGENT="saturn1.example.com:3872"
     
    #log message
    echo "###############################" >> $LOGFILE
    date >> $LOGFILE
    echo $OMS_ORACLE_HOME >> $LOGFILE
    id >>  $LOGFILE 2>&1
    
    #switch all OMS to point to new primary and startup all OMS
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1
    
    #Repeat the above two lines for each OMS in a multiple OMS setup. E.g.
    ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
    ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1
    
    #relocate Management Services and Repository target
    #to be done only once in a multiple OMS setup
    #allow time for OMS to be fully initialized
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl config emrep -agent $CENTRAL_AGENT -conn_desc -sysman_pwd <password>" >> $LOGFILE 2>&1
    
    #always return 0 so that dbms scheduler job completes successfully
    exit 0
    
  2. Automate Execution of Script by Trigger

    Create a database event "DB_ROLE_CHANGE" trigger, which fires after the database role changes from standby to primary. See the sample shipped with Cloud Control in OH/sysman/ha directory.

    --
    --
    -- Sample database role change trigger
    --
    --
    CREATE OR REPLACE TRIGGER FAILOVER_EM
    AFTER DB_ROLE_CHANGE ON DATABASE
    DECLARE
        v_db_unique_name varchar2(30);
        v_db_role varchar2(30);
    BEGIN
        select upper(VALUE) into v_db_unique_name
        from v$parameter where NAME='db_unique_name';
        select database_role into v_db_role
        from v$database;
     
        if v_db_role = 'PRIMARY' then
     
          -- Submit job to Resync agents with repository
          -- Needed if running in maximum performance mode
          -- and there are chances of data-loss on failover
          -- Uncomment block below if required
          -- begin
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_SET_IDENTIFIER);
          --  SYSMAN.emd_maintenance.full_repository_resync('AUTO-FAILOVER to '||v_db_unique_name||' - '||systimestamp, true);
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_CLEAR_IDENTIFIER);
          -- end;
     
          -- Start the EM mid-tier
          dbms_scheduler.create_job(
              job_name=>'START_EM',
              job_type=>'executable',
              job_action=> '<location>' || v_db_unique_name|| '_start_oms.sh',
              enabled=>TRUE
          );
        end if;
    EXCEPTION
    WHEN OTHERS
    THEN
        SYSMAN.mgmt_log.log_error('LOGGING', SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR,
    SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR_M || 'EM_FAILOVER: ' ||SQLERRM);
    END;
    /
     
    

    Note:

    Based on your deployment, you might require additional steps to synchronize and automate the failover of SLB and shared storage used for software library. These steps are vendor specific and beyond the scope of this document. One possibility is to invoke these steps from the Enterprise Manager Application Tier startup and configuration script.
  3. Configure Fast-Start Failover and Observer.

    Use the Fast-Start Failover configuration wizard in Enterprise Manager Console to enable FSFO and configure the Observer.

    This completes the setup of automatic failover..