23 Backing Up and Recovering Enterprise Manager

As the monitoring and management framework for your ecosystem, an important part of your high availability strategy is to ensure Enterprise Manager is regularly backed up so that it can be restored in the event of failure.

This chapter covers the following topics:

Backing Up Your Deployment

Although Enterprise Manager functions as a single entity, technically, it is built on a distributed, multi-tier software architecture composed of the following software components:

  • Oracle Management Services (OMS)

  • Management Agent

  • Management Repository

  • Software Library

Each component, being uniquely different in composition and function, requires different approaches to backup and recovery. For this reason, the backup strategies are discussed on a per-tier basis in this chapter. For an overview of Enterprise Manager architecture, see Installation of Enterprise Manager Cloud Control in the Oracle Enterprise Manager Cloud Control Basic Installation Guide.

Software Library Backup

The software library is a centralized media storage for Enterprise Manager software entities such as software patches, virtual appliance images, reference gold images, application software, and their associated directive scripts. The software library is an essential part of Enterprise Manager framework and is required by many Enterprise Manager features in order to function properly. The software library storage locations should be backed up periodically using file system backup. Oracle recommends the backup be performed at a frequency of 1 to 24 hours.

Management Repository Backup

The Management Repository is the storage location where all the information collected by the Management Agent gets stored. It consists of objects such as database jobs, packages, procedures, views, and tablespaces. Because it is configured in an Oracle Database, the backup and recovery strategies for the Management Repository are essentially the same as those for the Oracle Database. Backup procedures for the database are well established standards and can be implemented using the RMAN backup utility, which can be accessed via the Cloud Control console.

Management Repository Backup

Oracle recommends using High Availability Best Practices for protecting the Management Repository database against unplanned outages. As such, use the following standard database backup strategies.

  • Database should be in archivelog mode. Not running the repository database in archivelog mode leaves the database vulnerable to being in an unrecoverable condition after a media failure.

  • Perform regular hot backups with RMAN using the Recommended Backup Strategy option via the Cloud Control console. Other utilities such as DataGuard and RAC can also be used as part of a comprehensive HA and data protection strategy typically implemented with HA levels 3 and 4. For more information about the various HA levels, see Implementing High Availability Levels.

Adhering to these strategies will create a full backup and then create incremental backups on each subsequent run. The incremental changes will then be rolled up into the baseline, creating a new full backup baseline.

Using the Recommended Backup Strategy also takes advantage of the capabilities of Enterprise Manager to execute the backups: Jobs will be automatically scheduled through the Job sub-system of Enterprise Manager. The history of the backups will then be available for review and the status of the backup will be displayed on the repository database target home page. This backup job along with archiving and flashback technologies will provide a restore point in the event of the loss of any part of the repository. This type of backup, along with archive and online logs, allows the repository to be recovered to the last completed transaction.

You can view when the last repository backup occurred on the Management Services and Repository Overview page under the Repository details section.

For a thorough summary of how to configure backups using Enterprise Manager, see Configuring Your Database for Basic Backup and Recovery in the Oracle Database 2 Day DBA.. For additional information on Database high availability best practices, see Oracle Database High Availability Best Practices.

Oracle Management Service Backup

The Oracle Management Service (OMS) orchestrates with Management Agents to discover targets, monitor and manage them, and store the collected information in a repository for future reference and analysis. The OMS also renders the Web interface for the Enterprise Manager console.

Backing Up the OMS

The OMS is generally stateless. Some configuration data is stored on the OMS file system.

A snapshot of OMS configuration can be taken using the emctl exportconfig oms command.

$ <OMS_HOME>/bin/emctl exportconfig oms [-sysman_pwd <sysman password>]
[-dir <backup dir>] Specify directory to store backup file
[-keep_host] Specify this parameter if the OMS was installed using a virtual hostname (using
ORACLE_HOSTNAME=<virtual_hostname>)

Running exportconfig captures a snapshot of the OMS at a given point in time, thus allowing you to back up the most recent OMS configuration on a regular basis. exportconfig should always be run on the OMS running the WebLogic Admin Server. If required, the most recent snapshot can then be restored on a fresh OMS installation on the same or different host.

Backup strategies for the OMS components are as follows:

  • Software Homes

    Composed of Fusion Middleware Home, the OMS Oracle Home and the WebTier (OHS) Oracle Home and multiple Management Plug-in Oracle Homes.

    Software Homes changes when patches or patchsets are applied or updates are applied through the new Self Update feature. For this reason, filesystem-level backups should be taken after each patch/patchset application or application of updates through Self Update. You should back up the Oracle inventory files along with the Software Homes and save the output of opatch lsinventory –detail to make it easy to determine which patches are applied to the backed up Oracle Homes.

    Note:

    If you do not have filesystem-level backups, you can also reinstall the software homes using the “Installing Software Only" install method.

    Important: The location of the OMS Oracle Home must be the same for all OMS instances in your Cloud Control deployment.

  • Instance Home

    The gc_inst directory, composed of WebLogic Server, OMS and web tier configuration files.

    The Instance Home can be backed up using the emctl exportconfig oms command.

  • Administration Server

    The Administration Server operates as the central control entity for the configuration of the entire OMS instance domain. The Administration Server is an integral part of the first OMS installed in your Cloud Control deployment and shares the Software Homes and Instance Home.

    The Administration Server is backed up at the same time as the Instance Home, the emctl exportconfig oms command (only run on the first OMS with the Administration Server).

Management Agent Backup

The Management Agent is an integral software component that is deployed on each monitored host. It is responsible for monitoring all the targets running on those hosts, communicating that information to the middle-tier OMS and managing and maintaining the hosts and its targets.

Backing Up Management Agents

There are no special considerations for backing up Management Agents. As a best practice, reference Management Agent installs should be maintained for different platforms and kept up-to-date in terms of customizations in the emd.properties file and patches applied. Use Deployment options from the Cloud Control console to install and maintain reference Agent installs.

If a Management Agent is lost, it should be reinstalled by cloning from a reference install.

Recovery of Failed Enterprise Manager Components

Recovering Enterprise Manager means restoring any of the three fundamental components of the Enterprise Manager architecture.

  • Management Repository

  • Management Service

  • Management Agent

  • Software Library

Repository Recovery

Recovery of the Repository database must be performed using RMAN since Cloud Control will not be available when the repository database is down. There are two recovery cases to consider:

  • Full Recovery: No special consideration is required for Enterprise Manager.

  • Point-in-Time/Incomplete Recovery: Recovered repository may be out of sync with Agents because of lost transactions. In this situation, some metrics may show up incorrectly in the Cloud Control console unless the repository is synchronized with the latest state available on the Agents.

A repository resync feature allows you to automate the process of synchronizing the Enterprise Manager repository with the latest state available on the Management Agents.

To resynchronize the repository with the Management Agents, you use Enterprise Manager command-line utility (emctl) resync repos command:

emctl resync repos -full -name "<descriptive name for the operation>"

You must run this command from the OMS Oracle Home AFTER restoring the Management Repository, but BEFORE starting the OMS. After submitting the command, start up all OMS instances and monitor the progress of repository resychronization from the Enterprise Manager console's Repository Resynchronization page, as shown in the following figure.

Figure 23-1 Repository Synchronization Page


Repository Resynchronization page

Management Repository recovery is complete when the resynchronization jobs complete on all Management Agents.

Oracle strongly recommends that the Management Repository database be run in archivelog mode so that in case of failure, the database can be recovered to the latest transaction. If the database cannot be recovered to the last transaction, Repository Synchronization can be used to restore monitoring capabilities for targets that existed when the last backup was taken. Actions taken after the backup will not be recovered automatically. Some examples of actions that will not be recovered automatically by Repository Synchronization are:

  • Incident Rules

  • Preferred Credentials

  • Groups, Services, Systems

  • Jobs/Deployment Procedures

  • Custom Reports

  • New Agents

Recovery Scenarios

A prerequisite for repository (or any database) recovery is to have a valid, consistent backup of the repository. Using Enterprise Manager to automate the backup process ensures regular, up-to-date backups are always available if repository recovery is ever required. Recovery Manager (RMAN) is a utility that backs up, restores, and recovers Oracle Databases. The RMAN recovery job syntax should be saved to a safe location. This allows you to perform a complete recovery of the Enterprise Manager repository database. In its simplest form, the syntax appears as follows:

run {
restore database;
recover database;
}

Actual syntax will vary in length and complexity depending on your environment. For more information on extracting syntax from an RMAN backup and recovery job, or using RMAN in general, see the Oracle Database Backup and Recovery Advanced User's Guide.

The following scenarios illustrate various repository recovery situations along with the recovery steps.

Full Recovery on the Same Host

Repository database is running in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop all OMS instances using emctl stop oms -all.

  2. Recover the database using RMAN

  3. Bring the site up using the command emctl start oms on all OMS instances.

  4. Verify that the site is fully operational.

Incomplete Recovery on the Same Host

Repository database is running in noarchivelog mode. Full offline backup is available. The repository database disk crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using emctl stop oms -all.

  2. Recover the database using RMAN.

  3. Initiate Repository Resync using emctl resync repos -full -name "<resync name>" from one of the OMS Oracle Home.

  4. Start the OMS instances using emctl start oms.

  5. Log in to Cloud Control. From the Setup menu, select Manage Cloud Control, and then Health Overview. The Management Services and Repository page displays.

  6. From the OMS and Repository menu, select Repository Synchronization.

  7. Verify that the site is fully operational.

Full Recovery on a Different Host

The Management Repository database is running on host "A" in archivelog mode. Recent backup, archive log files and redo logs are available. The repository database crashes. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using the command emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository by running the following command on each OMS.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    
  4. Stop the OMS using the following command:

    emctl stop oms -all

  5. Start the OMS instances using the command

    emctl start oms.

  6. Relocate the Management Repository database target to the Agent running on host "B" by running the following command from the OMS:

    $emctl config repos -host <hostB> -oh <OH of repository on hostB>  -conn_desc "<TNS connect descriptor>"
    

    Note:

    This command can only be used to relocate the repository database under the following conditions:

    • An Agent is already running on this machine.

    • No database on host "B" has been discovered.

  7. Change the monitoring configuration for the OMS and Repository target: by running the following command from the OMS:

    $emctl config emrep -conn_desc "<TNS connect descriptor>"
    
  8. Verify that the site is fully operational.

Incomplete Recovery on a Different Host

The Management Repository database is running on host "A" in noarchivelog mode. Full offline backup is available. Host "A" is lost due to hardware failure. All datafiles and control files are lost.

Resolution:

  1. Stop the OMS instances using emctl stop oms.

  2. Recover the database using RMAN on a different host (host "B").

  3. Correct the connect descriptor for the repository in credential store.

    $emctl config oms –store_repos_details -repos_conndesc <connect descriptor> -repos_user sysman
    

    This commands will prompt you to stop and start the oms.

  4. Initiate Repository Resync:

    $emctl resync repos -full -name "<resync name>"

    from one of the OMS Oracle Homes.

  5. Start the OMS using the command emctl start oms.

  6. Run the command to relocate the repository database target to the Management Agent running on host "B":

    $emctl config repos -agent <agent on host B> -host <hostB> -oh <OH of repository on hostB> -conn_desc "<TNS connect descriptor>"

  7. Run the command to change monitoring configuration for the OMS and Repository target:

    emctl config emrep -conn_desc "<TNS connect descriptor>"

  8. Log in to Cloud Control. From the Setup menu, select Manage Cloud Control, and then select Health Overview.

  9. From the OMS and Repository menu, select Repository Synchronization. Monitor the status of resync jobs. Resubmit failed jobs, if any, after fixing the error mentioned.

  10. Verify that the site is fully operational.

Recovering the OMS

If an Oracle Management Service instance is lost, recovering it essentially consists of three steps: Recovering the Software Homes, configuring the Instance Home and recovering the Software Library if configured on same host as Enterprise Manager.

Recovering the Software Homes

When restoring on the same host, the software homes can be restored from filesystem backup. In case a backup does not exist, or if installing to a different host, the Software Homes can be reconstructed using the “Install Software Only" option from the Cloud Control software distribution. Care should be taken to select and install ALL Management Plug-ins that existed in your environment prior to crash.

  1. Connect to the Management Repository as SYSMAN and run the following SQL query to retrieve a list of installed plug-ins:
    SELECT epv.display_name, epv.plugin_id, epv.version, epv.rev_version,decode(su.aru_file, null, 'Media/External', 'https://updates.oracle.com/Orion/Services/download/'||aru_file||'?aru='||aru_id||chr(38)||'patch_file='||aru_file) URL
    FROM em_plugin_version epv, em_current_deployed_plugin ecp, em_su_entities su
    WHERE epv.plugin_type NOT IN ('BUILT_IN_TARGET_TYPE', 'INSTALL_HOME')
    AND ecp.dest_type='2'
    AND epv.plugin_version_id = ecp.plugin_version_id
    AND su.entity_id = epv.su_entity_id;
    

    The above query returns the list of plug-ins along with the URLs to download them if they were downloaded through self update. If plug-ins are present in the install media or are third party plug-ins not available through Self Update, the URLs are marked as "Media/Unknown".

  2. Download the additional plug-ins, if any, from the URLs in the list returned by the query in step 1 and place them in a single directory. Change the filename extension from .zip to .opar.
  3. Invoke the installer and select the Software-Only option to install the Middleware and OMS Oracle Home.
  4. To install the required plug-ins, you must then run the PluginInstall.sh script (OMS_HOME/sysman/install/PluginInstall.sh) with the PLUGIN_LOCATION=<absolute path to plugin dir> specifying the path to the directory where downloaded plugins are kept. When asked to select plugins, make sure you select the same plugins as were listed in the SQL query.

    Note:

    Recovery will fail if all required plug-ins have not been installed.

After the software-only mode, all patches that were installed prior to the crash must be re-applied. Assuming the Management Repository is intact, the post-scripts that run SQL against the repository can be skipped as the repository already has those patches applied.

To apply the patches in bitonly mode, use the following command:
  • $omspatcher apply -analyze -bitonly
  • $omspatcher apply -bitonly

As stated earlier, the location of the OMS Oracle Home is fixed and cannot be changed. Hence, ensure that the OMS Oracle Home is restored in the same location that was used previously.

Recreating the OMS

Once the Software Homes are recovered, the instance home can be reconstructed using the omsca command in recovery mode:

omsca recover –as –ms -nostart –backup_file <exportconfig file>

Use the export file generated by the emctl exportconfig command shown in the previous section.

OMS Recovery Scenarios

The following scenarios illustrate various OMS recovery situations along with the recovery steps.

Note:

A prerequisite for OMS recovery is to have recent, valid OMS configuration backups available. Oracle recommends that you back up the OMS using the emctl exportconfig oms command whenever an OMS configuration change is made. This command must be run on the primary OMS running the WebLogic AdminServer.

Alternatively, you can run this command on a regular basis using the Enterprise Manager Job system.

Each of the following scenarios cover the recovery of the Software homes using either a filesystem backup (when available and only when recovering to the same host) or using the Software only option from the installer. In either case, the best practice is to recover the instance home (gc_inst) using the omsca recover command, rather than from a filesystem backup. This guarantees that the instance home is valid and up to date.

Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host

Site hosts a single OMS. No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command on the primary OMS running the AdminServer. The OMS Oracle Home is lost.

Resolution:

  1. Perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    Note:

    Change Middleware|gc_inst to strings that match your own middleware and instance homes.

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the ‘Middleware' and ‘gc_inst' directories.

  2. Ensure that software library locations are still accessible and valid. If a Software library is accessible but corrupt, it will affect OMSCA recovery.

  3. Restore the Software Homes. See Recovering the Software Homes for more information.

    If restoring from a filesystem backup, delete the following file:

    OMS_HOME/sysman/config/emInstanceMapping.properties

    In addition, delete any gc_inst directories that may have been restored, if they exist.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

    Note:

    f BIP was configured in the first OMS, then the following ports also need to be passed. Recovery will fail otherwise.

    For example:

    <OMS_HOME>/bin/omsca recover -as -ms -nostart -backup_file /scratch/emga/opf_ADMIN_20151105_022311.bka -EM_BIP_PORT 9701 -EM_BIP_HTTPS_PORT 9803 -EM_BIP_OHS_PORT 9788 -EM_BIP_OHS_HTTPS_PORT 9851

  5. Start the OMS.

    OMS_HOME/bin/emctl start oms
    
  6. Recover the Agent (if necessary).

    If the Management Agent Software Home was recovered along with the OMS Software Homes, the Management Agent instance directory should be recreated to ensure consistency between the Management Agent and OMS.

    1. Remove the agent_inst directory if it was restored from backup.

    2. Use agentDeploy.sh to configure the agent:

      <AGENT_BASE_DIR>/core/13.3.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
      

      If the Management Agent configuration fails, see <AGENT_HOME>/cfgtoollogs/cfgfw/oracle.sysman.top.agent_<time_stamp>.log

    3. The OMS may block the Management Agent. Synchronize the agent with repository using the following command:

      <OMS_HOME>/bin/emcli resyncAgent -agent=<agent target name myhost.example.com:3872>

    If the Management Agent software home was not recovered along with the OMS but the Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Agent software home under the Middleware home.

  7. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host

Site hosts a single OMS. The OMS is running on host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost.

Resolution:

  1. Ensure that software library locations are accessible from “Host B".

    Note: If configured, all BIP shared locations (sharedLoc) should also accessible.

  2. Restore the software homes on “Host B". See Recovering the Software Homes for more information.

  3. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

    Note:

    f BIP was configured in the first OMS, then the following ports also need to be passed. Recovery will fail otherwise.

    For example:

    <OMS_HOME>/bin/omsca recover -as -ms -nostart -backup_file /scratch/emga/opf_ADMIN_20151105_022311.bka -EM_BIP_PORT 9701 -EM_BIP_HTTPS_PORT 9803 -EM_BIP_OHS_PORT 9788 -EM_BIP_OHS_HTTPS_PORT 9851
  4. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

  5. Configure the Agent.

    <AGENT_BASE_DIR>/agent_13.3.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    If the Management Agent configuration fails, see <AGENT_HOME>/cfgtoollogs/cfgfw/oracle.sysman.top.agent_<time_stamp>.log

  6. Relocate the oracle_emrep target to the Management Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli login –username=sysman
    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>
    

    Note:

    If you run emctl config emrep -agent and set the flag -ignore_timeskew, there may a loss of monitoring data as the availability of monitored targets may be affected when the Management Services and Repository target is moved to the new Agent.

  7. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  8. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Agent. Relocate duplicate targets from Agent "A" to Agent "B".

  9. Change the OMS to which all Management Agents point and then resecure all Agents.

    Because the new machine is using a different hostname from the one originally hosting the OMS, all Agents in your monitored environment must be told where to find the new OMS. On each Management Agent, run the following command:

    <AGENT_INST_DIR>/bin/emctl secure agent -emdWalletSrcUrl "http://hostB:<http_port>/em"
    
  10. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking ‘Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  11. Verify that the site is fully operational.

Single OMS, No SLB, OMS Restored on a Different Host using the Original Hostname

Site hosts a single OMS. The OMS is running on Host "A." No SLB is present. The OMS configuration was backed up using the emctl exportconfig oms command. Host "A" is lost. Recovery is to be performed on “Host B" but retaining the use of “Hostname A".

Resolution:

  1. Ensure that the software library location is accessible from Host "B".

  2. Restore the software homes on Host B. See Recovering the Software Homes for more information.

  3. Modify the network configuration such that “Host B" also responds to hostname of “Host A". Specific instructions on how to configure this are beyond the scope of this document. However, some general configuration suggestions are:

    Modify your DNS server such that both “Hostname B" and “Hostname A" network addresses resolve to the physical IP of “Host B".

    Multi-home “Host B". Configure an additional IP on “Host B" for the IP address that “Hostname A" resolves to. For example, on “Host B" run the following commands:

    ifconfig eth0:1 <IP assigned to “Hostname A"> netmask <netmask>
    /sbin/arping -q -U -c 3 -I eth0 <IP of HostA>
    
  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file> -AS_HOST <hostA> -EM_INSTANCE_HOST <hostA>
    

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  6. Configure the agent.

    An agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_HOME>/core/13.3.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    The OMS may block the Management Agent. Synchronize the Agent with repository using the following command:

    <OMS_HOME>/bin/emcli resyncAgent -agent=<agent target name myhost.example.com:3872>

  7. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer, Primary OMS Recovered on the Same Host

Site hosts multiple OMS instances. All OMS instances are fronted by a Server Load Balancer. OMS configuration backed up using the emctl exportconfig oms command on the primary OMS running the WebLogic AdminServer. The primary OMS is lost.

Resolution:

  1. Perform a cleanup on the failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    Note:

    Change Middleware|gc_inst to strings that match your own middleware and instance homes.

    If recovering the software homes using the software only install method, first de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the ‘Middleware' and ‘gc_inst' directories.

  2. Ensure that software library locations are still accessible.

  3. Restore the software homes. See Recovering the Software Homes for more information.

    If restoring from a filesystem backup, delete the following file:

    <OMS_HOME>/sysman/config/emInstanceMapping.properties

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  6. Recover the Management Agent.

    If the Management Agent software home was recovered along with the OMS software homes (as is likely in a Primary OMS install recovery where the agent and agent_inst directories are commonly under the Middleware home), the Management Agent instance directory should be recreated to ensure consistency between the Management Agent and OMS.

    1. Remove the agent_inst directory if it was restored from backup.

    2. Use agentDeploy.sh to configure the Management Agent:

      <AGENT_HOME>/core/13.3.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
      
    3. The OMS may block the Management Agent. Synchronize the Agent with the repository using the following command:

      <OMS_HOME>/bin/emcli resyncAgent -agent=<agent target name e.g. myhost.example.com:3872>

    If the Management Agent software home was not recovered along with the OMS but the Management Agent still needs to be recovered, follow the instructions in section Agent Reinstall Using the Same Port.

    Note:

    This is only likely to be needed in the case where a filesystem recovery has been performed that did not include a backup of the Management Agent software homes. If the OMS software homes were recovered using the Software only install method, this step will not be required because a Software only install installs an Management Agent software home under the Middleware home.

  7. Verify that the site is fully operational.

Multiple OMS, Server Load Balancer Configured, Primary OMS Recovered on a Different Host

Site hosts multiple OMS instances. OMS instances fronted by a Server Load Balancer. OMS Configuration backed up using emctl exportconfig oms command. Primary OMS on host "A" is lost and needs to be recovered on Host "B".

  1. If necessary, perform cleanup on failed OMS host.

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    
  2. Ensure that software library locations are accessible from “Host B".

  3. Restore the software homes on “Host B". See Recovering the Software Homes for more information.

  4. Run omsca in recovery mode specifying the export file taken earlier to configure the OMS:

    <OMS_HOME>/bin/omsca recover –as –ms –nostart –backup_file <exportconfig file>

    Note:

    The -backup_file to be passed must be the latest file generated from emctl exportconfig oms command.

  5. Start the OMS.

    <OMS_HOME>/bin/emctl start oms
    
  6. Configure the Management Agent.

    An Agent is installed as part of the Software only install and needs to be configured using the agentDeploy.sh command:

    <AGENT_BASE_DIR>/agent_13.3.0.0.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=<AGENT_BASE_DIR> AGENT_INSTANCE_HOME=<AGENT_INSTANCE_HOME> ORACLE_HOSTNAME=<AGENT_HOSTNAME> AGENT_PORT=<AGENT_PORT> -configOnly OMS_HOST=<oms host> EM_UPLOAD_PORT=<OMS_UPLOAD_PORT> AGENT_REGISTRATION_PASSWORD=<REG_PASSWORD>
    

    If any non-default plug-ins were previously deployed on the failed agent, they must be re-deployed after recovery of the Agent. Note that this pertains to plug-ins that existed on the recovering Agent before it failed (that are not related to the OMS/Repository target), and any plug-ins for additional targets the OMS Agent happened to be also monitoring. To re-deploy the plug-ins , run the following command (not as part of config emrep, or manually):

    emcli relocate_targets

  7. Additional Management Services, if any, must be re-enrolled with the Admin Server that is now running on host B. To re-enroll the Management Services, run the following command on each additional OMS:

    <OMS-HOME>/bin/emctl enroll oms -as_host <new Admin Server host, i.e. host B> -as_port <admin server port>

  8. Add the new OMS to the SLB virtual server pools and remove the old OMS.

  9. Relocate the oracle_emrep target to the Management Agent of the new OMS host using the following commands:

    <OMS_HOME>/bin/emcli sync
    <OMS_HOME>/bin/emctl config emrep -agent <agent on host "B", e.g myNewOMSHost.example.com:3872>

    Note:

    If you run emctl config emrep -agent and set the flag -ignore_timeskew, there may a loss of monitoring data as the availability of monitored targets may be affected when the Management Services and Repository target is moved to the new Agent.

  10. In the Cloud Control console, locate the 'WebLogic Domain' target for the Cloud Control Domain. Go to 'Monitoring Credentials' and update the adminserver host to host B. Then do a Refresh Weblogic Domain to reconfigure the domain with new hosts.

  11. Locate duplicate targets from the Management Services and Repository Overview page of the Enterprise Manager console. Click the Duplicate Targets link to access the Duplicate Targets page. To resolve duplicate target errors, the duplicate target must be renamed on the conflicting Management Agent. Relocate duplicate targets from Management Agent "A" to Management Agent "B".

  12. Assuming the original OMS host is no longer in use, remove the Host target (including all remaining monitored targets) from Cloud Control by selecting the host on the Targets > Hosts page and clicking ‘Remove'. You will be presented with an error that informs you to remove all monitored targets first. Remove those targets then repeat the step to remove the Host target successfully.

  13. All other OMSs in the system must re-enroll with the newly recovered OMS using the following command:

    emctl enroll oms -as_host <new OMS host> -as_port <port #, default 7101> 
    
  14. Verify that the site is fully operational.

Multiple OMS, SLB configured, additional OMS recovered on same or different host

Multiple OMS site where the OMS instances are fronted by an SLB. OMS configuration backed up using the emctl exportconfig oms command on the first OMS. Additional OMS is lost and needs to be recovered on the same or a different host.

  1. If recovering to the same host, ensure cleanup of the failed OMS has been performed:

    Make sure there are no processes still running from the Middleware home using a command similar to the following:

    ps -ef | grep -i -P "(Middleware|gc_inst)" | grep -v grep | awk '{print $2}' | xargs kill -9
    

    First de-install the existing Oracle Homes using the Cloud Control software distribution installer. This is required even if the software homes are no longer available as it is necessary to remove any record of the lost Oracle Homes from the Oracle inventory.

    If they exist, remove the Middleware and gc_inst directories.

  2. Ensure that shared software library locations are accessible.

  3. Install an Management Agent on the required host (same or different as the case may be).

  4. For procedures on installing additional Oracle Management Services, see Installing Additional Oracle Management Services in Silent Mode.

  5. Verify that the site is fully operational.

Recovering the Software Library

If the software library is lost, it should be restored from the last available backup. After restoring the backup, the following commands must be run to verify and re-import missing entities:

  1. emcli verify_swlib - This command verifies the accessibility of the software library storage locations and reports if entities are missing any files on the file system.
  2. emcli reimport_swlib_metadata - This command re-imports all Oracle-supplied entities that are shipped along with the product. If you have a recent backup, this should not be required. Run emcli reimport_swlib_metadata if the emcli verify_swlib command reports Oracle-owned entities with files missing from the filesystem.
  3. emcli verify_updates - This command verifies whether entities downloaded by Self Update are missing from the software library. For each missing entity, the command also displays the instructions to re-import the entitiy into the software library.

Recovering Management Agents

If a Management Agent is lost, it should be reinstalled by cloning from a reference install. Cloning from a reference install is often the fastest way to recover a Management Agent install because it is not necessary to track and reapply customizations and patches. Care should be taken to reinstall the Management Agent using the same port. Using the Enterprise Manager's Management Agent Resynchronization feature, a reinstalled Management Agent can be reconfigured using target information present in the Management Repository.

If agent is not reinstalled by using clone option, patches should be reapplied after new agent is installed.

Note:

Management Agent resynchronization can only be performed by Enterprise Manager Super Administrators.

When the Management Agent is reinstalled using the same port, the OMS detects that it has been re-installed and blocks it temporarily to prevent the auto-discovered targets in the re-installed Management Agent from overwriting previous customizations.

Note:

This is a condition in which the OMS rejects all heartbeat or upload requests from the blocked Management Agent. Hence, a blocked Agent will not be able to upload any alerts or metric data to the OMS. However, blocked Management Agents continue to collect monitoring data.

An Agent can be blocked due to one of several conditions. They are:

  • Enterprise Manager has detected that the Agent has been restored from a backup.

  • Plug-ins on the Agent do not match the records in the Management Repository.

  • The user has manually blocked the Agent.

For the first two conditions, an Agent resynchronization is required to unblock the agent by clearing the states on the Agent and pushing plug-ins from the Management Repository.

The Management Agent can be resynchronized and unblocked from the Management Agent homepage by using the emcli resyncAgent <agent target name> command. Resynchronization pushes all targets from the Management Repository to the Management Agent and then unblocks the Agent.

Management Agent Recovery Scenarios

The following scenarios illustrate various Management Agent recovery situations along with the recovery steps. The Management Agent resynchronization feature requires that a reinstalled Management Agent use the same port and location as the previous Management Agent that crashed.

Note:

Management Agent resynchronization can only be performed by Enterprise Manager Super Administrators.

Management Agent Reinstall Using the Same Port

A Management Agent is monitoring multiple targets. The Agent installation is lost.

  1. De-install the Agent Oracle Home using the Oracle Universal Installer.

    Note:

    This step is necessary in order to clean up the inventory.

  2. Install a new Management Agent or use the Management Agent clone option to reinstall the Management Agent though Enterprise Manager. Specify the same port that was used by the crashed Agent. The location of the install must be same as the previous install.

    The OMS detects that the Management Agent has been re-installed and blocks the Management Agent.

  3. Initiate Management Agent Resynchronization using the following command:

    emcli resyncAgent -agent="Agent Host:Port"

    All targets in the Management Repository are pushed to the new Management Agent. The Agent is instructed to clear backlogged files and then do a clearstate. The Agent is then unblocked.

  4. Reconfigure User-defined Metrics if the location of User-defined Metric scripts have changed.

  5. Verify that the Management Agent is operational and all target configurations have been restored using the following emctl commands:

    emctl status agent 
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Management Agent Restore from Filesystem Backup

A single Management Agent is monitoring multiple targets. File system backup for the Agent Oracle Home exists. The Agent install is lost.

  1. Restore the Management Agent from the filesystem backup then start the Management Agent.

    The OMS detects that the Management Agent has been restored from backup and blocks the Management Agent.

  2. Initiate Management Agent Resynchronization using the following command:

    emcli resyncAgent -agent="Agent Host:Port"

    All targets in the Management Repository are pushed to the new Management Agent. The Agent is instructed to clear backlogged files and performs a clearstate. The Management Agent is unblocked.

  3. Verify that the Management Agent is functional and all target configurations have been restored using the following emctl commands:

    emctl status agent
    emctl upload agent 
    

    There should be no errors and no XML files in the backlog.

Recovering from a Simultaneous OMS-Management Repository Failure

When both OMS and Management Repository fail simultaneously, the recovery situation becomes more complex depending upon factors such as whether the OMS and Management Repository recovery has to be performed on the same or different host, or whether there are multiple OMS instances fronted by an SLB. In general, the order of recovery for this type of compound failure should be Management Repository first, followed by OMS instances following the steps outlined in the appropriate recovery scenarios discussed earlier. The following scenarios illustrate two OMS-Management Repository failures and the requisite recovery steps.

Collapsed Configuration: Incomplete Management Repository Recovery, Primary OMS on the Same Host

Management Repository and the primary OMS are installed on same host (host "A"). The Management Repository database is running in noarchivelog mode. Full cold backup is available. A recent OMS backup file exists ( emctl exportconfig oms). The Management Repository, OMS and the Management Agent crash.

  1. Follow the Management Repository recovery procedure shown in Incomplete Recovery on the Same Host with the following exception:

    Since the OMS OracleHome is not available and Management Repository resynchronization has to be initiated before starting an OMS against the restored Management Repository, submit "resync" via the following PL/SQL block. Log into the Management Repository as SYSMAN using SQLplus and run:

    begin emd_maintenance.full_repository_resync('<resync name>'); end;
    
  2. Follow the OMS recovery procedure shown in Single OMS, No Server Load Balancer (SLB), OMS Restored on the same Host.

  3. Verify that the site is fully operational.

Distributed Configuration: Incomplete Management Repository Recovery, Primary OMS and additional OMS on Different Hosts, SLB Configured

The Management Repository, primary OMS, and additional OMS all reside on the different hosts. The Management Repository database was running in noarchivelog mode. OMS backup file from a recent backup exists (emctl exportconfig oms). Full cold backup of the database exists. All three hosts are lost.

  1. Follow the Management Repository recovery procedure shown in Incomplete Recovery on the Same Host. with the following exception:

    Since OMS Oracle Home is not yet available and Management Repository resync has to be initiated before starting an OMS against the restored Management Repository, submit resync via the following PL/SQL block. Log into the Management Repository as SYSMAN using SQLplus and run the following:

    begin emd_maintenance.full_repository_resync('resync name'); end;
    
  2. Follow the OMS recovery procedure shown in Multiple OMS, Server Load Balancer Configured, Primary OMS Recovered on a Different Host with the following exception:

    Override the Management Repository connect description present in the backup file by passing the additional omsca parameter:

    -REPOS_CONN_STR <restored repos descriptor>
    

    This needs to be added along with other parameters listed in Multiple OMS, Server Load Balancer Configured, Primary OMS Recovered on a Different Host.

  3. Follow the OMS recovery procedure shown in Multiple OMS, SLB configured, additional OMS recovered on same or different host.

  4. Verify that the site is fully operational.