Skip Headers
Oracle® Enterprise Manager Cloud Control Administrator's Guide
12c Release 1 (12.1.0.1)

Part Number E24473-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

36 Enterprise Manager Outages

Outages can be planned as might be the case when performing upgrades or periodic maintenance, or unplanned as can happen in the event of hardware/software failure, or perhaps some environmental catastrophe. Regardless of the type of outage, you want to ensure that your IT infrastructure can be restored an running as soon as possible.

This chapter covers the following:

Enterprise Manager Recovery

Recovering Enterprise Manager means restoring any of the three fundamental components of the Enterprise Manager architecture. Specifically, restoration involves restoring the

Repository Recovery

Recovery of the Repository database must be performed using RMAN since Cloud Control will not be available when the repository database is down. There are two recovery cases to consider:

  • Full Recovery: No special consideration is required for Enterprise Manager.

  • Point-in-Time/Incomplete Recovery: Recovered repository may be out of sync with Agents because of lost transactions. In this situation, some metrics may show up incorrectly in the Cloud Control console unless the repository is synchronized with the latest state available on the Agents.

A repository resync feature (Enterprise Manager version 10.2.0.5 and later) allows you to automate the process of synchronizing the Enterprise Manager repository with the latest state available on the Agents.

Note:

resync requires Agents version 10.2.0.5 or later. Older Agents must be synchronized manually. See "Manually Resynchronizing Agents" on page 1-2.

To resynchronize the repository with the Agents, you use Enterprise Manager Command-line utility (emctl) resync repos command:

emctl resync repos -full -name "<descriptive name for the operation>"

You must run this command from the OMS Oracle Home after restoring the repository but BEFORE starting the OMS. After submitting the command, start up all OMS's and monitor the progress of repository resychronization from the Enterprise Manager console's Repository Resynchronization page, as shown in Figure 1–1.

Figure 36-1 Repository Synchronization Page

Description of Figure 36-1 follows
Description of "Figure 36-1 Repository Synchronization Page"

Repository recovery is complete when the resynchronization jobs complete on all Agents.

Oracle strongly recommends that the repository database be run in archivelog mode so that in case of failure, the database can be recovered to the latest transaction. If the database cannot be recovered to the last transaction, Repository Synchronization can be used to restore monitoring capabilities for targets that existed when the last backup was taken. Actions taken after the backup will not be recovered automatically. Some examples of actions that will not be recovered automatically by Repository Synchronization are:

  • Incident Rules

  • Preferred Credentials

  • Groups, Services, Systems

  • Jobs/Deployment Procedures

  • Custom Reports

  • New Agents

Manually Resynchronizing Agents

The Enterprise Manager Repository Synchronization feature can only be used for Agents 10.2.0.5 or later. Older Agents must be resynchronized manually by shutting down the Agent using the following procedure:

  1. Shut down the Agent.

  2. Delete the agentstmp.txt, lastupld.xml, state/* and upload/* files from the <AGENT_HOME>/sysman/emd directory.

  3. Restart the Agent.

Recovery Scenarios

A prerequisite for repository (or any database) recovery is to have a valid, consistent backup of the repository. Using Enterprise Manager to automate the backup process ensures regular, up-to-date backups are always available if repository recovery is ever required. Recovery Manager (RMAN) is a utility that backs up, restores, and recovers Oracle Databases. The RMAN recovery job syntax should be saved to a safe location. This allows you to perform a complete recovery of the Enterprise Manager repository database. In its simplest form, the syntax appears as follows:

run {

restore database;

recover database;

}

Actual syntax will vary in length and complexity depending on your environment. For more information on extracting syntax from an RMAN backup and recovery job, or using RMAN in general, see the Oracle Database Backup and Recovery Advanced User's Guide.

The following scenarios illustrate various repository recovery situations along with the recovery steps.

Full Recovery on the Same Host

Repository database is running in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop all OMS instances using emctl stop oms.

  2. Recover the database using RMAN

  3. Bring the site up using the command emctl start oms on all OMS instances.

  4. Verify that the site is fully operational.

Incomplete Recovery on the Same Host

Repository database is running in noarchivelog mode. Full offline backup is available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using emctl stop oms.

  2. Recover the database using RMAN.

  3. Initiate Repository Resync using emctl resync repos -full -name "<resync name>" from one of the OMS Oracle Home.

  4. Start the OMS(s) using emctl start oms.

  5. Log into Cloud Control. Navigate to Management Services and Repository Overview page. Click on Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error.

  6. Verify that the site is fully operational.

Full Recovery on a Different Host

The repository database is running on host "A" in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using the command emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store by running

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Start the OMS(s) using the command emctl start oms.

  5. Relocate the repository database target to the Agent running on host "B" by running the following command from the OMS:

    $emctl config repos -host <hostB> -oh <OH of repository on hostB>  -conn_desc "<TNS connect descriptor>"
    

    Note:

    This command can only be used to relocate the repository database under the following conditions:
    • An Agent is already running on this machine.

    • No database on host "B" has been discovered.

  6. Change the monitoring configuration for the OMS and Repository target: by running the following command from the OMS:

    $emctl config emrep -conn_desc "<TNS connect descriptor>"
    
  7. Verify that the site is fully operational.

Incomplete Recovery on a Different Host

The repository database is running on host "A" in noarchivelog mode. Full offline backup is available. Host "A" is lost due to hardware failure. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS(s) using emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Initiate Repository Resync:

    $emctl resync repos -full -name "<resync name>"

    from one of the OMS Oracle Homes.

  5. Start the OMS using the command emctl start oms.

  6. Run the command to relocate the repository database target to the Agent running on host "B":

    $emctl config repos -agent <agent on host B> -host <hostB> -oh <OH of repository on hostB> -conn_desc "<TNS connect descriptor>"

  7. Run the command to change monitoring configuration for the OMS and Repository target:

    emctl config emrep -conn_desc "<TNS connect descriptor>"

  8. Manually fix all pre-10.2.0.5 Agents by shutting down the Agents, deleting the agentstmp.txt, lastupld.xml, state/* and upload/* files under the <AGENT_HOME>/sysman/emd directory and then restarting the Agents.

  9. Log in to Cloud Control. Navigate to Management Services and Repository Overview page. Choose on Repository Synchronization under Related Links. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error mentioned.

  10. Verify that the site is fully operational.

Recovering the OMS

If an OMS is lost, recovering an OMS essentially consists of two steps, recovering the Software Homes and then configuring the Instance Home. When restoring on the same host, the software homes can be restored from filesystem backup. In case a backup does not exist, or if installing to a different host, the software homes can be reconstructed using the “Install Software Only" option from the Cloud Control software distribution. Care should be taken to select and install all Management Plugins that existed in your environment prior to crash. The following SQL command can be run against the repository database as the “sysman” user to get the list of plugins already deployed:

select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;

Note that some plugins might have not shipped with Cloud Control and might not be present in the install media. Such plugins should be downloaded from OTN and their location passed to the Oracle Installer. Choose all plugins returned by the SQL query above in the Plugins page of the Installer. Recovery will fail if all the required plugins are not selected.

After running the installer in software only mode, all patches that were installed prior to the crash must be re-applied. Assuming the repository is intact, the post scripts that run SQLs against the repository can be skipped as the repository already has those patches applied.

As stated earlier, the location of the OMS Oracle Home is fixed and cannot be changed. Hence, ensure that the OMS Oracle Home is restored in the same location that was used previously.

Once the Software Homes are recovered, the instance home can be reconstructed using the omsca command in recovery mode:

omsca recover –as –ms -nostart –backup_file <exportconfig file>

Use the export file generated by the emctl exportconfig command shown in the previous section.

OMS Recovery Scenarios

The following scenarios illustrate various OMS recovery situations along with the recovery steps.

Important:

A prerequisite for OMS recovery is to have recent, valid OMS configuration backups available. Oracle recommends that you back up the OMS using the emctl exportconfig oms command whenever an OMS configuration change is made. This command must be run on the primary OMS running the WebLogic AdminServer.

Alternatively, you can run this command on a regular basis using the Enterprise Manager Job system.

Each of the following scenarios cover the recovery of the Software homes using either a filesystem backup (when available and only when recovering to the same host) or using the Software only option from the installer. In either case, the best practice is to recover the instance home (gc_inst) using the omsca recover command, rather than from a filesystem backup. This guarantees that the instance home is valid and up to date.

Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host

Site hosts a single OMS. No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command on the primary OMS running the AdminServer. The OMS Oracle Home is lost.

Resolution:

  1. Perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that software library locations are still accessible.

  3. Restore the software homes.

    If restoring from a filesystem backup, delete the file <OMS_HOME>/sysman/config/emInstanceMapping.properties and any gc_inst directory that may have been restored, if they exist.

    Alternatively, if a backup does not exist, use the software only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the repository database:

      select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Configure and deploy required plugins

    <OMS_HOME>/bin/pluginca -action midtierconfig -plugins <plugin-list> -oraclehome <oms oracle home> -middlewarehome <wls middleware home>
    

    where plugin-list is a comma separated list of “plugin-version” value for each row returned by the query:

    select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
    

    For exmple:

    /u01/app/oracle/Middleware/oms/bin/pluginca -action midtierconfig -plugins "oracle.sysman.mos=12.1.0.0.0,oracle.sysman.emas=12.1.0.0.0,oracle.sysman.db=12.1.0.0.0" -oraclehome /u01/app/oracle/Middleware/oms -middlewarehome /u01/app/oracle/Middleware
    
  6. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  7. Recover the Agent (if necessary).

    If the Agent software home was recovered along with the OMS software homes (as is likely in a single OMS install recovery where the agent and agent_inst directories are commonly under the Middleware home), the Agent instance directory should be recreated to ensure consistency between the Agent and OMS.

    Remove the agent_inst directory if it was restored from backup

    Use agentDeploy.sh to configure the agent:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Agent. Resync the Agent from the Agent homepage.

    If the Agent software home was not recovered along with the OMS but the Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Agent software home under the Middleware home.
  8. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost.

Resolution:

  1. Ensure that software library locations are accessible from “Host B”.

  2. Restore the software homes on “Host B”.

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the repository database:

      select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  4. Configure and deploy required plugins.

    <OMS_HOME>/bin/pluginca -action midtierconfig -plugins <plugin-list> -oraclehome <oms oracle home> -middlewarehome <wls middleware home>
    

    where plugin-list is a comma separated list of “plugin-version” value for each row returned by the query:

    select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
    

    For example:

    /u01/app/oracle/Middleware/oms/bin/pluginca -action midtierconfig -plugins "oracle.sysman.mos=12.1.0.0.0,oracle.sysman.emas=12.1.0.0.0,oracle.sysman.db=12.1.0.0.0" -oraclehome /u01/app/oracle/Middleware/oms -middlewarehome /u01/app/oracle/Middleware
    
  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

  6. Configure the Agent.

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Agent. Resync the Agent from the Agent homepage

  7. Relocate the oracle_emrep target to the Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli login –username=sysman
    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>
    
  8. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  9. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  10. Change the OMS to which all Agents point and then resecure all Agents.

    Because the new machine is using a different hostname from the one originally hosting the OMS, all Agents in your monitored environment must be told where to find the new OMS. On each Agent, run the following command:

    <AGENT_INST_DIR>/bin/emctl secure agent -emdWalletSrcUrl "http://hostB:<http_port>/em"
    
  11. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking 'Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  12. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host using the Original Hostname

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost. Recovery is to be performed on “Host B” but retaining the use of “Hostname A”.

Resolution:

  1. Ensure that loader receive directory and software library locations are accessible from Host "B".

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the repository database:

      select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME 
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  2. Modify the network configuration such that “Host B” also responds to hostname of “Host A”. Specific instructions on how to configure this are beyond the scope of this document. However, some general configuration suggestions are:

    Modify your DNS server such that both “Hostname B” and “Hostname A” network addresses resolve to the physical IP of “Host B”.

    Multi-home “Host B”. Configure an additional IP on “Host B” for the IP address that “Hostname A” resolves to. For example, on “Host B” run the following commands:

    ifconfig eth0:1 <IP assigned to “Hostname A”> netmask <netmask>
    /sbin/arping -q -U -c 3 -I eth0 <IP of HostA>
    
  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file> -AS_HOST <hostA> -EM_INSTANCE_HOST <hostA>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  4. Configure and deploy required plug-ins.

    <OMS_HOME>/bin/pluginca -action midtierconfig -plugins <plugin-list> -oraclehome <oms oracle home> -middlewarehome <wls middleware home>
    

    where plugin-list is a comma separated list of “plugin-version” value for each row returned by the query:

    select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
    

    For example:

    /u01/app/oracle/Middleware/oms/bin/pluginca -action midtierconfig -plugins "oracle.sysman.mos=12.1.0.0.0,oracle.sysman.emas=12.1.0.0.0,oracle.sysman.db=12.1.0.0.0" -oraclehome /u01/app/oracle/Middleware/oms -middlewarehome /u01/app/oracle/Middleware
    
  5. Start the OMS

    <OMS_HOME>/bin/emctl start oms
    
  6. Configure the agent.

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    
  7. The OMS automatically blocks the Agent. Resync the Agent from the Agent homepage.

    Run the command to relocate Management Services and Repository target to Agent "B":

    emctl config emrep -agent <agent on host B>
    
  8. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  9. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer, Primary OMS Recovered on the Same Host

Site hosts multiple OMSs. All OMSs are fronted by a Server Load Balancer. OMS configuration backed up using the emctl exportconfig oms command on the primary OMS running the WebLogic AdminServer. The primary OMS is lost.

Resolution:

  1. Perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that software library locations are still accessible.

  3. Restore the software homes.

    If restoring from a filesystem backup, delete the file <OMS_HOME>/sysman/config/emInstanceMapping.properties and any gc_inst directory that may have been restored, if they exist. Alternatively, if a backup does not exist, use the software only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the repository database:

      select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Configure and deploy required plug-ins.

    <OMS_HOME>/bin/pluginca -action midtierconfig -plugins <plugin-list> -oraclehome <oms oracle home> -middlewarehome <wls middleware home>
    

    where plugin-list is a comma separated list of “plugin-version” value for each row returned by the query:

    select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
    

    For example:

    /u01/app/oracle/Middleware/oms/bin/pluginca -action midtierconfig -plugins "oracle.sysman.mos=12.1.0.0.0,oracle.sysman.emas=12.1.0.0.0,oracle.sysman.db=12.1.0.0.0" -oraclehome /u01/app/oracle/Middleware/oms -middlewarehome /u01/app/oracle/Middleware
    
  6. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  7. Recover the Agent.

    If the Agent software home was recovered along with the OMS software homes (as is likely in a Primary OMS install recovery where the agent and agent_inst directories are commonly under the Middleware home), the Agent instance directory should be recreated to ensure consistency between the Agent and OMS.Remove the agent_inst directory if it was restored from backup Use agentDeploy.sh to configure the Agent:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Agent. Resync the Agent from the Agent homepage.

    If the Agent software home was not recovered along with the OMS but the Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Agent software home under the Middleware home.
  8. Re-enroll the additional OMS, if any, with the recovered Administration Server by running emctl enroll oms on each additional OMS.

  9. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host

Site hosts multiple OMSs. OMSs fronted by a Server Load Balancer. OMS Configuration backed up using emctl exportconfig oms command. Primary OMS on host "A" is lost and needs to be recovered on Host "B".

  1. If necessary, perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    
  2. Ensure that software library locations are accessible from “Host B”.

  3. Restore the software homes on “Host B”.

    Oracle does not support restoring OMS Oracle Homes from filesystem backup across different hosts. Use the software-only install method to reconstruct the software homes:

    1. Select the 'Install Software Only' option from the 'Install Types' step page within the Cloud Control software installer.

    2. Ensure all previously deployed plug-ins are selected on the 'Select Plug-ins' step page.

      It is possible to determine which plugins were deployed previously by running the following SQL against the repository database:

      select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
      

      Note:

      At the end of the Software only installation, do NOT run ConfigureGC.pl when told to do so by the installer. This step should only be performed as part of a fresh install, not as part of a recovery operation.
    3. Apply any patches that were previously applied to the OMS software homes.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.
  5. Configure and deploy required plug-ins.

    <OMS_HOME>/bin/pluginca -action midtierconfig -plugins <plugin-list> -oraclehome <oms oracle home> -middlewarehome <wls middleware home>
    

    where plugin-list is a comma separated list of “plugin-version” value for each row returned by the query:

    select epv.display_name, gcp.plugin_id||':'||gcp.version "plugin-version" from GC_CURRENT_DEPLOYED_PLUGIN gcp, MGMT_OMS_PARAMETERS omsp, EM_PLUGIN_VERSION epv where gcp.DESTINATION_TYPE='OMS' and gcp.DESTINATION_NAME = omsp.host_url and omsp.NAME = 'HOST_NAME' and gcp.plugin_id = epv.PLUGIN_ID;
    

    For example:

    /u01/app/oracle/Middleware/oms/bin/pluginca -action midtierconfig -plugins "oracle.sysman.mos=12.1.0.0.0,oracle.sysman.emas=12.1.0.0.0,oracle.sysman.db=12.1.0.0.0" -oraclehome /u01/app/oracle/Middleware/oms -middlewarehome /u01/app/oracle/Middleware
    
  6. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  7. Configure the Agent.

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_HOME>/core/12.1.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS automatically blocks the Agent. Resync the Agent from the Agent homepage

  8. Add the new OMS to the SLB virtual server pools and remove the old OMS.

  9. Relocate the oracle_emrep target to the Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>
    
  10. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  11. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  12. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking 'Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  13. Verify that the site is fully operational.

Multiple OMS, SLB configured, additional OMS recovered on same or different host

Multi OMS site. OMSs fronted by SLB. OMS configuration backed up using emctl exportconfig oms command on the first OMS. Additional OMS is lost and needs to be recovered on the same or a different host.

  1. If recovering to the same host, ensure cleanup of the failed OMS has been peformed:

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the 'Middleware' and 'gc_inst' directories.

  2. Ensure that shared software library locations are accessible.

  3. Install an Agent on the required host (same or different as the case may be).

  4. Use the Additional OMS deployment procedure to configure a new additional OMS.

  5. Verify that the site is fully operational.

Recovering Agents

If an Agent is lost, it should be reinstalled by cloning from a reference install. Cloning from a reference install is often the fastest way to recover an Agent install as it is not necessary to track and reapply customizations and patches. Care should be taken to reinstall the Agent using the same port. Using the Enterprise Manager's Agent Resynchronization feature, a reinstalled Agent can be reconfigured using target information present in the repository. When the Agent is reinstalled using the same port, the OMS detects that it has been re-installed and blocks it temporarily to prevent the auto-discovered targets in the re-installed Agent from overwriting previous customizations.

Blocked Agents:

A Blocked Agent is a condition where the OMS rejects all heartbeat or upload requests from the blocked Agent. Hence, a blocked Agent will not be able to upload any alerts or metric data to the OMS. However, blocked Agents continue to collect monitoring data.

The Agent can be resynchronized and unblocked from the Agent homepage by clicking on the Resynchronize Agent button. Resynchronization pushes all targets from the repository to the Agent and then unblocks the Agent.

Agent Recovery Scenarios

The following scenarios illustrate various Agent recovery situations along with the recovery steps. Agent recovery is supported for Agent versions 10.2.0.5 and later. The Agent resynchronization feature requires that a reinstalled Agent use the same port as the previous Agent that crashed.

Agent Reinstall Using the Same Port

An Agent is monitoring multiple targets. The Agent installation is lost.

  1. Deinstall Agent OracleHome using the Oracle Universal Installer.

    Note:

    This step is necessary in order to clean up the inventory.
  2. Install a new Agent or use the Agent clone option to reinstall the Agent though Enterprise Manager. Specify the same port that was used by the crashed Agent. The location of the install need not be same as the previous install.

    The OMS detects that the Agent has been re-installed and blocks the Agent.

  3. Initiate Agent Resynchronization from the Agent homepage.

    All targets in the repository are pushed to the new Agent. The Agent is instructed to clear backlogged files and then do a clearstate. The Agent is then unblocked.

  4. Reconfigure User-defined Metrics if the location of User-defined Metric scripts have changed.

  5. Verify that the Agent is operational and all target configurations have been restored restored using the following emctl commands:

    emctl status agent 
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Agent Restore from Filesystem Backup

An Agent is monitoring multiple targets. File system backup for the Agent Oracle Home exists. The Agent install is lost.

  1. Deinstall Agent OracleHome using OUI.

    Note:

    This step is necessary in order to clean up the inventory.
  2. Restore the Agent from the filesystem backup then start the Agent.

    The OMS detects that the Agent has been restored from backup and blocks the Agent.

  3. Initiate Agent Resynchronization from the Agent homepage.

    All targets in the repository are pushed to the new Agent. The Agent is instructed to clear backlogged files and performs a clearstate. The Agent is unblocked.

  4. Verify that the Agent is functional and all target configurations have been restored using the following emctl commands:

    emctl status agent
    
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Recovering from a Simultaneous OMS-Repository Failure

When both OMS and repository fail simultaneously, the recovery situation becomes more complex depending upon factors such as whether the OMS and repository recovery has to be performed on the same or different host, or whether there are multiple OMSs fronted by an SLB. In general, the order of recovery for this type of compound failure should be repository first, followed by OMS(s) following the steps outlined in the appropriate recovery scenarios discussed earlier. The following scenarios illustrate two OMS-Repository failures and the requisite recovery steps.

Collapsed Configuration: Incomplete Repository Recovery, Primary OMS on the Same Host

Repository and the primary OMS are installed on same host (host "A"). The repository database is running in noarchivelog mode. Full cold backup is available. A recent OMS backup file exists ( emctl exportconfig oms). The repository, OMS and the Agent crash.

  1. Follow the repository recovery procedure shown in Incomplete Recovery on the Same Host with the following exception:

    Since the OMS OracleHome is not available and repository resynchronization has to be initiated before starting an OMS against the restored repository, submit "resync" via the following PL/SQL block. Log into the repository as SYSMAN using SQLplus and run:

    begin emd_maintenance.full_repository_resync('<resync name>'); end;
    
  2. Follow the OMS recovery procedure shown in Section 1.2.2.1, "Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host"

  3. Verify that the site is fully operational.

Distributed Configuration: Incomplete Repository Recovery, Primary OMS and additional OMS on Different Hosts, SLB Configured

The Repository, primary OMS, and additional OMS all reside on the different hosts. Repository database was running in noarchivelog mode. OMS backup file from a recent backup exists (emctl exportconfig oms). Full cold backup of the database exists. All three hosts are lost.

  1. Follow the repository recovery procedure shown in Incomplete Recovery on the Same Host" with the following exception:

    Since OMS OracleHome is not yet available and Repository resync has to be initiated before starting an OMS against the restored repository, submit resync via the following PL/SQL block. Log into the repository as SYSMAN using SQLplus and run the following:

    begin emd_maintenance.full_repository_resync('resync name'); end;
    
  2. Follow the OMS recovery procedure shown in "Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host" with the following exception:

    Override the repository connect description present in the backup file by passing the additional omsca parameter:

    -REPOS_CONN_STR <restored repos descriptor>
    

    This needs to be added along with other parameters listed in Multiple OMS, Server Load Balancer configured, Primary OMS Recovered on a Different Host.

  3. Follow the OMS recovery procedure shown in Multiple OMS, SLB configured, additional OMS recovered on same or different host.

  4. Verify that the site is fully operational.

Switchover

Switchover is a planned activity where operations are transferred from the Primary site to a Standby site. This is usually done for testing and validation of Disaster Recovery (DR) scenarios and for planned maintenance activities on the primary infrastructure. This section describes the steps to switchover to the standby site. The same procedure is applied to switchover in either direction.

Enterprise Manager Console cannot be used to perform switchover of the Management Repository database. Use the Data Guard Broker command line tool DGMGRL instead.

  1. Prepare the Standby Database

    Verify that recovery is up-to-date. Using the Enterprise Manager Console, you can view the value of the ApplyLag column for the standby database in the Standby Databases section of the Data Guard Overview Page.

  2. Shut down the Primary Enterprise Manager Application Tier.

    Shutdown all the Management Service instances in the primary site by running the following command on each Management Service:

    emctl stop oms -all

  3. Verify Shared Loader Directory / Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  4. Switch over to the Standby Database

    Use DGMGRL to perform a switchover to the standby database. The command can be run on the primary site or the standby site. The switchover command verifies the states of the primary database and the standby database, affects switchover of roles, restarts the old primary database, and sets it up as the new standby database.

    SWITCHOVER TO <standby database name>;

    Verify the post switchover states. To monitor a standby database completely, the user monitoring the database must have SYSDBA privileges. This privilege is required because the standby database is in a mounted-only state. A best practice is to ensure that the users monitoring the primary and standby databases have SYSDBA privileges for both databases.

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    
  5. Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman

  6. Startup the Enterprise Manager Application Tier

    Startup all the Management Services on the standby site:

    emctl start oms

  7. Relocate Management Services and Management Repository target

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

    emctl config emrep -agent <agent name> -conn_desc

  8. Switchover to Standby SLB.

    Make appropriate network changes to failover your primary SLB to standby SLB that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

  9. Establish the old primary Management Services as the new standby Management Services to complete the switchover process.

    Start the Administration Server on old primary site

    emctl start oms -admin_only

    Point the old primary Management Services to the new Primary Repository database by running the following command on each Management Service on the old primary site.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman

This completes the switchover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Repeat the same procedure to switchover in the other direction.

Keeping the Standby Site in Sync

After the initial setup of the standby site, the standby site has to be kept in sync with the changes done on primary site. Transactions on the Primary Management Repository get propagated to the Standby Management Repository automatically through Data Guard but the OMS side changes have to be redone manually on the standby site. The following sections describe this procedure for typical activities.

Applying patches

When patches are applied on the primary site Management Services, they have to be applied on the standby site Management Services too. Note that patches typically update the Oracle Homes (via the opatch apply command) and optionally might require scripts to be run against the Management Repository. On the standby site, it is sufficient to update the Oracle Homes (via the opatch apply command) and skip the running of scripts on the Management Repository because database changes are automatically propagated to the standby site using Data Guard.

Managing Plugins

When new Plugins are deployed on the Primary Site or existing Plugins upgraded or un-deployed on the Primary Site, the following procedures needs to be run on the standby site too to keep the Standby Management Services in sync. Note if the Standby Management Services are not kept in sync, they would fail to start when a switchover or failover operation is attempted.

The procedure below assume that the standby site was setup as per the documented process and the standby management services are currently down and point to the standby repository. The plugin(s) deployment on the Primary site has been completed successfully.

Deploying a new Plugin or upgrading a Plugin on Standby Site

  1. Extract the Plugin archives from the Primary site

    Go to the Self Update Home, click on Plugins, select the required plugin and select export from the Action table menu. Note the emcli command from the popup that gets displayed.

    emcli export_update -id=<update id> -deep -host=<standby OMS host> -dir=<directory to export archives> <host credential options>
    

    Note that an additional option “-deep” is required. This command would create 4 files on the destination directory specified. The filename <version>_OMS_<platform>_<revision>.zip is the one to be used in next step.

  2. Startup the Standby Administration Server, if it is down.

    emctl start oms -admin_only

  3. Install the OMS archive to First Standby OMS Oracle Home

    pluginia –archives <path to plugin archive>

  4. Configure the Plug-in on First Standby OMS Oracle Home

    pluginca -action deploy -isFirstOMS true -plugins <plugin-list> -oracleHome <oms oracle home> -middlewareHome <wls middleware home>
    

    where <plugin-list> is the plugin name in the format <plugin-id>=<plugin-version>

  5. Repeat steps 3 and 4 for each Standby additional OMS

    pluginia –archives <path to plugin archive>
    pluginca -action deploy -isFirstOMS false -plugins <plugin-list> -oracleHome <oms oracle home> -middlewareHome <wls middleware home>
    

This completes the plugin deployment on Standby site.

Failover

A standby database can be converted to a primary database when the original primary database fails and there is no possibility of recovering the primary database in a timely manner. This is known as a manual failover. There may or may not be data loss depending upon whether your primary and target standby databases were synchronized at the time of the primary database failure.

This section describes the steps to failover to a standby database, recover the Enterprise Manager application state by resynchronizing the Management Repository database with all Management Agents, and enabling the original primary database as a standby using flashback database.

The word manual is used here to contrast this type of failover with a fast-start failover described later in Section 1.7, "Automatic Failover".

  1. Verify Shared Loader Directory and Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  2. Failover to Standby Database.

    Shutdown the database on the primary site. Use DGMGRL to connect to the standby database and execute the FAILOVER command:

    FAILOVER TO <standby database name>;

    Verify the post failover states:

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    

    Note that after the failover completes, the original primary database cannot be used as a standby database of the new primary database unless it is re-enabled.

  3. Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
    
  4. Resync the New Primary Database with Management Agents.

    Skip this step if you are running in Data Guard Maximum Protection or Maximum Availability level as there is no data loss on failover. However, if there is data loss, synchronize the new primary database with all Management Agents.On any one Management Service on the standby site, run the following command:

    emctl resync repos -full -name "<name for recovery action>"

    This command submits a resync job that would be executed on each Management Agent when the Management Services on the standby site are brought up.

    Repository resynchronization is a resource intensive operation. A well tuned Management Repository will help significantly to complete the operation as quickly as possible. Specifically if you are not routinely coalescing the IOTs/indexes associated with Advanced Queueing tables as described in My Oracle Support note 271855.1, running the procedure before resync will significantly help the resync operation to complete faster.

  5. Startup the Enterprise Manager Application Tier

    Startup all the Management Services on the standby site by running the following command on each Management Service.

    emctl start oms

  6. Relocate Management Services and Management Repository target.

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the standby site Management Service.

    emctl config emrep -agent <agent name> -conn_desc

  7. Switchover to the Standby SLB.

    Make appropriate network changes to failover your primary SLB to the standby SLB, that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

  8. Establish Original Primary Database as Standby Database Using Flashback

    Once access to the failed site is restored and if you had flashback database enabled, you can reinstate the original primary database as a physical standby of the new primary database.

    1. Shutdown all the Management Services in original primary site.

      emctl stop oms -all

    2. Restart the original primary database in mount state:

      shutdown immediate;

      startup mount;

    3. Reinstate the Original Primary Database

      Use DGMGRL to connect to the old primary database and execute the REINSTATE command

      REINSTATE DATABASE <old primary database name>;

    4. The newly reinstated standby database will begin serving as standby database to the new primary database.

    5. Verify the post reinstate states.

      SHOW CONFIGURATION;

      SHOW DATABASE <primary database name>;

      SHOW DATABASE <standby database name>;

  9. Establish Original Primary Management Service as the standby Management Service.

    Start the Administration Server on old primary site

    emctl start oms -admin_only

    Point the old primary Management Service to the new Primary Repository database by running the following command on each Management Service on the old primary site.

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
    
  10. Monitor and complete Repository Resynchronization

    Navigate to the Management Services and Repository Overview page of Cloud Control Console. Under Related Links, click Repository Synchronization. This page shows the progress of the resynchronization operation on a per Management Agent basis. Monitor the progress.

    Operations that fail should be resubmitted manually from this page after fixing the error mentioned. Typically, communication related errors are caused by Management Agents being down and can be fixed by resubmitting the operation from this page after restarting the Management Agent.

    For Management Agents that cannot be started due to some reason, for example, old decommissioned Management Agents, the operation should be stopped manually from this page. Resynchronization is deemed complete when all the jobs have a completed or stopped status.

This completes the failover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Perform a switchover procedure if the site operations have to be moved back to the original primary site.

Automatic Failover

This section details the steps to achieve complete automation of failure detection and failover procedure by utilizing Fast-Start Failover and Observer process. At a high level the process works as follows:

Perform the following steps:

  1. Develop Enterprise Manager Application Tier Configuration and Startup Script

    Develop a script that will automate the Enterprise Manager Application configuration and startup process. See the sample shipped with Cloud Control in the OH/sysman/ha directory. A sample script for the standby site is included here and should be customized as needed. Make sure ssh equivalence is setup so that remote shell scripts can be executed without password prompts. Place the script in a location accessible from the standby database host. Place a similar script on the primary site.

    #!/bin/sh
    # Script: /scratch/EMSBY_start.sh
    # Primary Site Hosts
    # Repos: earth, OMS: jupiter1, jupiter2
    # Standby Site Hosts
    # Repos: mars, # OMS: saturn1, saturn2
    LOGFILE="/net/mars/em/failover/em_failover.log"
    OMS_ORACLE_HOME="/scratch/OracleHomes/em/oms11"
    CENTRAL_AGENT="saturn1.example.com:3872"
     
    #log message
    echo "###############################" >> $LOGFILE
    date >> $LOGFILE
    echo $OMS_ORACLE_HOME >> $LOGFILE
    id >>  $LOGFILE 2>&1
    
    #switch all OMS to point to new primary and startup all OMS
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1
    
    #Repeat the above two lines for each OMS in a multiple OMS setup. E.g.
    ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
    ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1
    
    #relocate Management Services and Repository target
    #to be done only once in a multiple OMS setup
    #allow time for OMS to be fully initialized
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl config emrep -agent $CENTRAL_AGENT -conn_desc -sysman_pwd <password>" >> $LOGFILE 2>&1
    
    #always return 0 so that dbms scheduler job completes successfully
    exit 0
    
  2. Automate Execution of Script by Trigger

    Create a database event "DB_ROLE_CHANGE" trigger, which fires after the database role changes from standby to primary. See the sample shipped with Cloud Control in OH/sysman/ha directory.

    --
    --
    -- Sample database role change trigger
    --
    --
    CREATE OR REPLACE TRIGGER FAILOVER_EM
    AFTER DB_ROLE_CHANGE ON DATABASE
    DECLARE
        v_db_unique_name varchar2(30);
        v_db_role varchar2(30);
    BEGIN
        select upper(VALUE) into v_db_unique_name
        from v$parameter where NAME='db_unique_name';
        select database_role into v_db_role
        from v$database;
     
        if v_db_role = 'PRIMARY' then
     
          -- Submit job to Resync agents with repository
          -- Needed if running in maximum performance mode
          -- and there are chances of data-loss on failover
          -- Uncomment block below if required
          -- begin
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_SET_IDENTIFIER);
          --  SYSMAN.emd_maintenance.full_repository_resync('AUTO-FAILOVER to '||v_db_unique_name||' - '||systimestamp, true);
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_CLEAR_IDENTIFIER);
          -- end;
     
          -- Start the EM mid-tier
          dbms_scheduler.create_job(
              job_name=>'START_EM',
              job_type=>'executable',
              job_action=> '<location>' || v_db_unique_name|| '_start_oms.sh',
              enabled=>TRUE
          );
        end if;
    EXCEPTION
    WHEN OTHERS
    THEN
        SYSMAN.mgmt_log.log_error('LOGGING', SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR,
    SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR_M || 'EM_FAILOVER: ' ||SQLERRM);
    END;
    /
     
    

    Note:

    Based on your deployment, you might require additional steps to synchronize and automate the failover of SLB and shared storage used for software library. These steps are vendor specific and beyond the scope of this document. One possibility is to invoke these steps from the Enterprise Manager Application Tier startup and configuration script.
  3. Configure Fast-Start Failover and Observer.

    Use the Fast-Start Failover configuration wizard in Enterprise Manager Console to enable FSFO and configure the Observer.

    This completes the setup of automatic failover.

How to Configure Cloud Control OMS in Active/Passive Environment for High Availability Failover Using Virtual Host Names

This section provides a general reference for Cloud Control administrators who want to configure Enterprise Manager 11gR1 Cloud Control in Cold Failover Cluster (CFC) environments.

Overview and Requirements

The following conditions must be met for Cloud Control to fail over to a different host:

  • The installation must be done using a Virtual Host Name and an associated unique IP address.

  • Install on a shared disk/volume which holds the binaries, the configuration and the runtime data (including the recv directory).

  • Configuration data and metadata must also failover to the surviving node.

  • Inventory location must failover to the surviving node.

  • Software owner and time zone parameters must be the same on all cluster member nodes that will host this Oracle Management Service (OMS).

Installation and Configuration

To override the physical host name of the cluster member with a virtual host name, software must be installed using the parameter ORACLE_HOSTNAME. For inventory pointer, the software must be installed using the command line parameter -invPtrLoc to point to the shared inventory location file, which includes the path to the shared inventory location.

If you are using an NFS mounted volume for the installation, please ensure that you specify rsize and wsize in your mount command to prevent running into I/O issues.

For example:

oms.acme.com:/u01/app/share1 /u01/app/share1 nfs rw,bg,rsize=32768,wsize=32768,hard,nointr,tcp,noac,vers=3,timeo=600 0 0

Note:

Any reference to shared failover volumes could also be true for non-shared failover volumes which can be mounted on active hosts after failover.

Setting Up the Virtual Host Name/Virtual IP Address

You can set up the virtual host name and virtual IP address by either allowing the clusterware to set it up, or manually setting it up yourself before installation and startup of Oracle services. The virtual host name must be static and resolvable consistently on the network. All nodes participating in the setup must resolve the virtual IP address to the same host name. Standard TCP tools such as nslookup and traceroute can be used to verify the host name. Validate using the following commands:

nslookup <virtual hostname>

This command returns the virtual IP address and full qualified host name.

nslookup <virtual IP>

This command returns the virtual IP address and fully qualified host name.

Be sure to try these commands on every node of the cluster and verify that the correct information is returned.

Setting Up Shared Storage

Storage can be managed by the clusterware that is in use or you can use any shared file system (FS) volume as long as it is not an unsupported type, such as OCFS V1. The most common shared file system is NFS.

Note:

If the OHS directory is on a shared storage, the LockFile directive in the httpd.conf file should be modified to point to a local disk, otherwise there is a potential for locking issues.

Setting Up the Environment

Some operating system versions require specific operating system patches be applied prior to installing 11gR1. The user installing and using the 11gR1 software must also have sufficient kernel resources available. Refer to the operating system's installation guide for more details.Before you launch the installer, certain environment variables need to be verified. Each of these variables must be identically set for the account installing the software on ALL machines participating in the cluster:

  • OS variable TZ

    Time zone setting. You should unset this variable prior to installation

  • PERL variables

    Variables such as PERL5LIB should also be unset to avoid association to the incorrect set of PERL libraries

Synchronizing Operating System IDs

The user and group of the software owner should be defined identically on all nodes of the cluster. This can be verified using the 'id' command:

$ id -a

uid=550(oracle) gid=50(oinstall) groups=501(dba)

Setting Up Shared Inventory

Use the following steps to set up shared inventory:

  1. Create your new ORACLE_HOME directory.

  2. Create the Oracle Inventory directory under the new oracle home:

    $ cd <shared oracle home>

    $ mkdir oraInventory

  3. Create the oraInst.loc file. This file contains the Inventory directory path information needed by the Universal Installer.

    1. vi oraInst.loc

    2. Enter the path information to the Oracle Inventory directory and specify the group of the software owner as the oinstall user. For example:

      inventory_loc=/app/oracle/product/11.1/oraInventory

      inst_group=oinstall

Installing the Software

Refer to the following steps when installing the software:

  1. Create the shared disk location on both the nodes for the software binaries

  2. Install WebLogic Server. For information on installing WebLogic Server, refer to Oracle Enterprise Manager Cloud Control Basic Installation Guide.

  3. Point to the inventory location file oraInst.loc (under the ORACLE_BASE in this case), as well as specifying the host name of the virtual group. For example:

    $ export ORACLE_HOSTNAME=lxdb.acme.com
    $ runInstaller -invPtrloc /app/oracle/share1/oraInst.loc 
    ORACLE_HOSTNAME=lxdb.acme.com -debug
    
  1. Install Oracle Management Services on cluster member Host1 using the option, "EM install using the existing DB"

  2. Continue the remainder of the installation normally.

  3. Once completed, copy the files oraInst.loc and oratab to /etc. Also copy /opt/oracle to all cluster member hosts (Host2, Host3, and so on).

Windows Specific Configuration Steps

On Windows environments, an additional step is required to copy over service and keys required by the Oracle software. Note that these steps are required if your clustering software does not provide a shared windows registry.

  1. Using regedit on the first host, export each Oracle service from under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services.

  2. Using regedit on the first host, export HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE.

  3. Use regedit to import the files created in step 1 and 2 to the failover host.

Starting Up Services

Ensure that you start your services in the proper order. Use the order listed below:

  1. Establish IP address on the active node

  2. Start the TNS listener (if it is part of the same failover group)

  3. Start the database (if it is part of the same failover group)

  4. Start Cloud Control using emctl start oms

  5. Test functionality

In case of failover, refer to the following steps:

  1. Establish IP on failover box

  2. Start TNS listener using the command lsnrctl start if it is part of the same failover group

  3. Start the database using the command dbstart if it is part of the same failover group

  4. Start Cloud Control using the command emctl start oms

  5. Test the functionality

Summary

The OMS mid-tier component of Cloud Control can now be deployed in a CFC environments that utilize a floating host name.

Configuring Targets for Failover in Active/Passive Environments

This section provides a general reference for Cloud Control administrators who want to relocate Cold Failover Cluster (CFC) targets from one existing Management Agent to another. Although the targets are capable of running on multiple nodes, these targets run only on the active node in a CFC environment.

CFC environments generally use a combination of cluster software to provide a virtual host name and IP address along with interconnected host and storage systems to share information and provide high availability for applications. Automating failover of the virtual host name and IP, in combination with relocating the Enterprise Manager targets and restarting the applications on the passive node, requires the use of Oracle Enterprise Manager command-line interface (EM CLI) and Oracle Clusterware (running Oracle Database release 10g or 11g) or third-party cluster software. Several Oracle partner vendors provide clusterware solutions in this area.

The Enterprise Manager Command Line Interface (EM CLI) allows you to access Enterprise Manager Cloud Control functionality from text-based consoles (terminal sessions) for a variety of operating systems. Using EM CLI, you can perform Enterprise Manager Cloud Control console-based operations, like monitoring and managing targets, jobs, groups, blackouts, notifications, and alerts. See the Oracle Enterprise Manager Command Line Interface manual for more information.

Target Relocation in Active/Passive Environments

Beginning with Oracle Enterprise Manager 10g release 10.2.0.5, a single Oracle Management Agent running on each node in the cluster can monitor targets configured for active / passive high availability. Only one Management Agent is required on each of the physical nodes of the CFC cluster because, in case of a failover to the passive node, Enterprise Manager can move the HA monitored targets from the Management Agent on the failed node to another Management Agent on the newly activated node using a series of EMCLI commands. See the Oracle® Enterprise Manager Command Line Interface manual for more information.

If your application is running in an active/passive environment, the clusterware brings up the applications on the passive node in the event that the active node fails. For Enterprise Manager to continue monitoring the targets in this type of configuration, the existing Management Agent needs additional configuration.

The following sections describe how to prepare the environment to automate and restart targets on the new active node. Failover and fallback procedures are also provided.

Installation and Configuration

The following sections describe how to configure Enterprise Manager to support a CFC configuration using the existing Management Agents communicating with the Oracle Management Service processes:

  • Prerequisites

  • Configuration Steps

Prerequisites

Prepare the Active/Passive environments as follows:

  • Ensure the operating system clock is synchronized across all nodes of the cluster. (Consider using Network Time Protocol (NTP) or another network synchronization method.)

  • Use the EM CLI RELOCATE_TARGETS command only with Enterprise Manager Release 10.2.0.5 (and higher) Management Agents.

Configuration Steps

The following steps show how to configure Enterprise Manager to support a CFC configuration using the existing Management Agents that are communicating with the OMS processes. The example that follows is based on a configuration with a two-node cluster that has one failover group. For additional information about targets running in CFC active/passive environments, see My Oracle Support note 406014.1.

  1. Configure EM CLI

    To set up and configure target relocation, use the Oracle Enterprise Manager command-line interface (EM CLI). See the Oracle Enterprise Manager Command Line Interface manual and the Oracle Enterprise Manager Extensibility manual for information about EM CLI and Management Plug-Ins.

  2. Install Management Agents

    Install the Management Agent on a local disk volume on each node in the cluster. Once installed, the Management Agents are visible in the Cloud Control console.

  3. Discover Targets

    After the Active / Passive targets have been configured, use the Management Agent discovery screen in the Cloud Control console to add the targets (such as database, listener, application server, and so on). Perform the discovery on the active node, which is the node that is currently hosting the new target.

Failover Procedure

To speed relocation of targets after a node failover, configure the following steps using a script that contains the commands necessary to automatically initiate a failover of a target. Typically, the clusterware software has a mechanism with which you can automatically execute the script to relocate the targets in Enterprise Manager. Also, see Section 1.9.6, "Script Examples" for sample scripts.

  1. Shut down the target services on the failed active node.

    On the active node where the targets are running, shut down the target services running on the virtual IP.

  2. If required, disconnect the storage for this target on the active node.

    Shut down all the applications running on the virtual IP and shared storage.

  3. Enable the target's IP address on the new active node.

  4. If required, connect storage for the target on the currently active node.

  5. Relocate the targets in Cloud Control using EM CLI.

    To relocate the targets to the Management Agent on the new active node, issue the EM CLI RELOCATE TARGET command for each target type (listener, application servers, and so on) that you must relocate after the failover operation. For example:

    emcli relocate_targets
    -src_agent=<node 1>:3872 
    -dest_agent=<node 2>:3872
    -target_name=<database_name>
    -target_type=oracle_database
    -copy_from_src 
    -force=yes
    

    In the example, port 3872 is the default port for the Management Agent. To find the appropriate port number for your configuration, use the value for the EMD_URL parameter in the emd.properties file for this Management Agent.

    Note: In case of a failover event, the source agent will not be running. However, there is no need to have the source Management Agent running to accomplish the RELOCATE operation. EM CLI is an OMS client that performs its RELOCATE operations directly against the Management Repository.

Fallback Procedure

To return the HA targets to the original active node or to any other cluster member node:

  1. Repeat the steps in Section 1.9.3, "Failover Procedure" to return the HA targets to the active node.

  2. Verify the target status in the Cloud Control console.

EM CLI Parameter Reference

Issue the same command for each target type that will be failed over to (or be switched over) during relocation operations. For example, issue the same EM CLI command to relocate the listener, the application servers, and so on. Table 1–1 shows the EM CLI parameters you use to relocate targets:

Table 36-1 EM CLI Parameters

EM CLI Parameter Description

-src_agent

Management Agent on which the target was running before the failover occurred.

-dest_agent

Management Agent that will be monitoring the target after the failover.

-target_name

Name of the target to be failed over.

-target_type

Type of target to be failed over (internal Enterprise Manager target type). For example, the Oracle database (for a standalone database or an Oracle RAC instance), the Oracle listener for a database listener, and so on.

-copy_from_src

Use the same type of properties from the source Management Agent to identify the target. This is a MANDATORY parameter! If you do not supply this parameter, you can corrupt your target definition!

-force

Force dependencies (if needed) to failover as well.


Script Examples

The following sections provide script examples:

  • Relocation Script

  • Start Listener Script

  • Stop Listener Script

Relocation Script

#! /bin/ksh

#get the status of the targets

emcli get_targets -
targets="db1:oracle_database;listener_db1:oracle_listener" -noheader

  if [[ $? != 0 ]]; then exit 1; fi

# blackout the targets to stop false errors.  This blackout is set to expire in 30 minutes

emcli create_blackout -name="relocating active passive test targets" -
add_targets="db1:oracle_database;listener_db1:oracle_listener" -
reason="testing failover" -
schedule="frequency:once;duration:0:30"
  if [[ $? != 0 ]]; then exit 1; fi

# stop the listener target.  Have to go out to a OS script to use the 'lsnrctl set
current_listener' function

emcli execute_hostcmd -cmd="/bin/ksh" -osscript="FILE" -
input_file="FILE:/scratch/oraha/cfc_test/listener_stop.ksh" -
credential_set_name="HostCredsNormal" -
targets="host1.us.oracle.com:host"
  if [[ $? != 0 ]]; then exit 1; fi

# now, stop the database

emcli execute_sql -sql="shutdown abort" -
targets="db1:oracle_database" -
credential_set_name="DBCredsSYSDBA"
  if [[ $? != 0 ]]; then exit 1; fi

# relocate the targets to the new host

emcli relocate_targets -
src_agent=host1.us.oracle.com:3872 -
dest_agent=host2.us.oracle.com:3872 -
target_name=db1 -target_type=oracle_database -
copy_from_src -force=yes  -
changed_param=MachineName:host1vip.us.oracle.com
  if [[ $? != 0 ]]; then exit 1; fi

emcli relocate_targets -
src_agent=host1.us.oracle.com:3872 -
dest_agent=host2.us.oracle.com:3872 -
target_name=listener_db1 -target_type=oracle_listener -
copy_from_src -force=yes  -
changed_param=MachineName:host1vip.us.oracle.com
  if [[ $? != 0 ]]; then exit 1; fi

# Now, restart database and listener on the new host

emcli execute_hostcmd -cmd="/bin/ksh" -osscript="FILE" -
input_file="FILE:/scratch/oraha/cfc_test/listener_start.ksh" -
credential_set_name="HostCredsNormal" -
targets="host2.us.oracle.com:host"
  if [[ $? != 0 ]]; then exit 1; fi

emcli execute_sql -sql="startup" -
targets="db1:oracle_database" -
credential_set_name="DBCredsSYSDBA"
  if [[ $? != 0 ]]; then exit 1; fi

# Time to end the blackout and let the targets become visible

emcli stop_blackout -name="relocating active passive test targets"
  if [[ $? != 0 ]]; then exit 1; fi

# and finally, recheck the status of the targets

emcli get_targets -
targets="db1:oracle_database;listener_db1:oracle_listener" -noheader
  if [[ $? != 0 ]]; then exit 1; fi

Start Listener Script

#!/bin/ksh

export 
ORACLE_HOME=/oradbshare/app/oracle/product/11.1.0/db
export PATH=$ORACLE_HOME/bin:$PATH

lsnrctl << EOF
set current_listener listener_db1
start
exit
EOF

Stop Listener Script

#!/bin/ksh
export 
ORACLE_HOME=/oradbshare/app/oracle/product/11.1.0/db
export PATH=$ORACLE_HOME/bin:$PATH

lsnrctl << EOF
set current_listener listener_db1
stop
exit
EOF