Skip Headers
Oracle® Enterprise Manager Cloud Control Administrator's Guide
12c Release 3 (12.1.0.3)

E24473-27
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

30 Enterprise Manager Disaster Recovery

While high availability typically protects against local outages such as application failures or system-level problems, disaster tolerance protects against larger outages such as catastrophic data center failure due to natural disasters, fire, electrical failure, evacuation, or pervasive sabotage. For Maximum Availability, the loss of a site cannot be the cause for outage of the management tool that handles your enterprise.

Maximum Availability Architecture for Enterprise Manager involves deploying a remote failover architecture that allows a secondary data center to take over the management infrastructure in the event that disaster strikes the primary management infrastructure.

Note:

The procedure for creating a standby OMS differs according to the version of Enterprise Manager. The

12.1.0.2 and earlier - Standby OMSs using Standby WebLogic Domain. See Appendix D, "Standby OMSs using Standby WebLogic Domain" for more information

12.1.0.3 and later - Standby OMSs using Storage Replication. Discussed in this chapter.

This chapter covers the following topics:

30.1 Disaster Recover Overview and Topology

The Disaster Recovery solution for a Cloud Control deployment involves replicating the OMS, Software Library and Repository components to a standby site. This solution can be combined with the high availability solution described earlier to ensure that failures ranging from component failure to a complete site outage can be recovered from with minimal disruption to the availability of Cloud Control.

Some advantages to using a replicated standby configuration are:

  • OMS patching and upgrade only needs to be performed at one site.

  • Plug-ins only need to be installed at one site.

  • Keeping primary and standby sites synchronized is greatly simplified.

Only one site can be active at any given time. In this configuration, the application traffic is directed to the appropriate site by a DNS entry that is updated when the standby site is activated. The standby site must be similar to the primary site in terms of hardware and network resources to ensure there is no loss of performance when failover happens. There must be sufficient network bandwidth between the primary and standby sites to handle peak redo data generation.

A complete implementation of the Enterprise Manager Cloud Control Disaster Recovery solution is shown in the following figure.

Figure 30-1 Disaster Recovery Topology

Graphic illustrates standby site replication topology.

Key aspects of the DR solution shown in the figure are:

  • The solution has two sites. The current production site is running and active, while the second site is serving as a standby site and is in passive mode.

  • Oracle Data Guard Physical Standby is used to replicate Repository transaction from primary site to standby site.

  • The primary OMS hostnames should be resolvable to the IP addresses of the corresponding standby hosts at the standby site.

  • OMS hosts on each site have mount points defined for accessing the shared storage system for the site.

  • On the primary site, OMS software binaries and configuration files, Oracle Inventory, Software library and Agent binaries and configuration files for all OMS(s) are on the shared storage.

  • Storage replication technology is used to copy the OMS file systems from the production site's shared storage to the standby site's shared storage.

  • It is not necessary to perform any Oracle software installations at the standby OMS site hosts. When the production site storage is replicated at the standby site storage, the equivalent Oracle home directories and data are written to the standby site storage.

  • Schedule incremental replications at a specified interval. The minimum recommended interval is once a day for the OMS filesystem as the configuration does not change very often. Additionally, you should force a manual synchronization whenever you make a change to the middle tier configuration at the production site (for example, if you apply a patch on the production site). The software library on the other hand requires a more frequent replication schedule, as frequent as the network infrastructure would allow. Oracle recommends continous replication or a minimum frequency of 1 hour.

  • Before forcing a manual synchronization, you should take a snapshot of the site to capture its current state. This ensures that the snapshot gets replicated to the standby site storage and can be used to roll back the standby site to a previous synchronization state, if desired. Recovery to the point of the previously successful replication (for which a snapshot was created) is possible when a replication fails.

  • Client access to Cloud Control is through a virtual hostname. This accomplished via a DNS entry or a Global Load Balancer that is configured to direct Cloud Control users and Agents to the currently active site

  • When there is a failure or planned outage of the production site, you perform the following steps to enable the standby site to assume the production role in the topology:

    • Stop OMSs at the primary site

    • Failover/switchover of the database to the standby site

    • Activate OMS and Software Library replicated storage at standby site

    • Reverse storage replication direction

    • Start OMSs at standby site

    • Update DNS or global load balancer to re-route user requests to the standby site. At this point, the standby site has assumed the production role.

30.2 Design Considerations

This section describes design considerations for an Cloud Control Disaster Recovery solution for an enterprise deployment.

The following topics are covered:

30.2.1 Network Considerations

The following sections discuss network considerations that must be taken into account when implementing standby Management Services using storage replication

30.2.1.1 Planning Host Names

In a Disaster Recovery topology, the production site host names must be resolvable to the IP addresses of the corresponding peer systems at the standby site. Therefore, it is important to plan the host names for the production site and standby site. After switchover or failover from a primary site to a standby site, it should be possible to start applications on the standby hosts without requiring you to change the hostname for hosts on the standby site.

This can be achieved in either of the following ways:

  • Option 1: Physical host names on primary site and alias on standby site: OMSs at the primary site are configured using physical host names and aliases for these host names are configured on the corresponding hosts at the standby site.

  • Option 2: Alias host names on both sites: OMSs at the primary site are configured using an alias host name that can be configured at both the primary and standby sites.

The choice between these options would depend on your network infrastructure and corporate policies. From a setup procedure perspective, Option 1 is easier to implement if you have an existing single site Cloud Control installation which uses the physical host names as it does not require any transformation of your existing site to setup DR. Option 2 is easier to implement if you are setting up a new Cloud Control installation and start with alias host names or you have an existing Cloud Control installation using alias host names.

Host name resolution at each site can be done using either local resolution (/etc/hosts) or DNS based resolution or a combination of both. The following examples use these physical host names and IP addresses:

HOSTNAME             IP ADDRESS      DESCRIPTION
oms1-p.example.com   123.1.2.111     Physical host for OMS1 on Primary site
oms2-p.example.com   123.1.2.112     Physical host for OMS2 on Primary site
oms1-s.example.com   123.2.2.111     Physical host for OMS1 on Standby site
oms2-s.example.com   123.2.2.112     Physical host for OMS2 on Standby site

Example for Option 1: /etc/hosts configurations when OMSs are installed at primary site using primary site physical host names (oms1-p.example.com and oms2-p.example.com):

Primary Site
 
127.0.0.1     localhost.localdomain  localhost
123.1.2.111   oms1-p.example.com     oms1-p #OMS1
123.1.2.112   oms2-p.example.com     oms2-p #OMS2
 
Standby Site
 
127.0.0.1     localhost.localdomain  localhost
123.2.2.111   oms1-s.example.com     oms1-s      oms1-p.example.com #OMS1
123.2.2.112   oms2-s.example.com     oms2-s      oms2-p.example.com #OMS2

If the network has been configured correctly, a ping of the OMS host name from the primary site should result in a reply from the primary host, and a ping of the OMS host name from the standby site should result in a reply from the standby host.

Ping results from primary site (reply from primary site):

[oracle@oms1-p ~]$ ping oms1-p.example.com
PING oms1-p.example.com (123.1.2.111) 56(84) bytes of data.
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=2 ttl=64 time=0.020 ms
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=3 ttl=64 time=0.022 ms

Ping results from standby site (reply from standby site)

[oracle@oms1-s ~]$ ping oms1-p.example.com
PING oms1-s.example.com (123.2.2.111) 56(84) bytes of data.
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=2 ttl=64 time=0.020 ms
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=3 ttl=64 time=0.022 ms

Example for Option 2: /etc/hosts configuration when OMSs are installed using alias host names (oms1.example.com and oms2.example.com):

Primary Site
 
127.0.0.1     localhost.localdomain   localhost
123.1.2.111   oms1-p.example.com      oms1-p     oms1.example.com #OMS1
123.1.2.112   oms2-p.example.com      oms2-p     oms2.example.com #OMS2
 
Standby Site
 
127.0.0.1    localhost.localdomain    localhost
123.2.2.111  oms1-s.example.com       oms1-s     oms1.example.com #OMS1
123.2.2.112  oms2-s.example.com       oms2-s     oms2.example.com #OMS2

If the network has been configured correctly, a ping of the OMS host name from the primary site should result in a reply from the primary host, and a ping of the OMS host name from the standby site should result in a reply from the standby host.

Example:

Ping results from primary site (reply from primary site):

[oracle@oms1-p ~]$ ping oms1.example.com
PING oms1-p.example.com (123.1.2.111) 56(84) bytes of data.
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=2 ttl=64 time=0.020 ms
64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=3 ttl=64 time=0.022 ms

Ping results from standby site (reply from standby site)

[oracle@oms1-s ~]$ ping oms1.example.com
PING oms1-s.example.com (123.2.2.111) 56(84) bytes of data.
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=2 ttl=64 time=0.020 ms
64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=3 ttl=64 time=0.022 ms

30.2.1.2 Load Balancers Consideration

A multiple OMS site requires a server load balancer. Both Primary and Standby Sites must be fronted by thier own server load balancer. See "Configuring a Load Balancer". The SLB pools on the standby site will reference the IP addresses of the standby OMS hosts.

30.2.1.3 Application Virtual Host Name Consideration

A hostname through which the Cloud Control clients (agents and users) should access Cloud Control is required. When the primary site is active, this hostname should be configured in DNS to resolve to the IP address hosted by the primary site SLB. When the standby site is activated, the DNS entry should be updated so that the hostname resolves to the IP address hosted by the standby site SLB.

The DNS configuration for the Cloud Control application hostname is shown in the table below:

Table 30-1 DNS Configuration

DNS NAME DNS RECORD TYPE VALUE COMMENTS

em.example.com

CNAME

slb_primary.example.com

Virtual Hostname used by Cloud Control clients to communicate with Management Service. Should point to SLB of currently active site

slb_primary.example.com

A

123.1.2.110

Primary Site SLB address

slb_standby.example.com

A

123.2.2.110

Standby Site SLB address


The DNS switchover can be accomplished by either using a global load balancer or manually changing DNS names.

  • A global load balancer can provide authoritative DNS name server equivalent capabilities. One advantage of using a global load balancer is that the time for a new name-to-IP mapping to take effect can be almost immediate. The downside is that an additional investment must be made for the global load balancer

  • Manually changing the DNS names. To ensure that DNS records cached by the Cloud Control clients are updated in a timely fashion after an update, it is recommended to set the TTL for the em.example.com CNAME to a low value such as 60 seconds. This will ensure that DNS changes will quickly propagate to all clients. However due to the shortened caching period, an increase in DNS requests can be observed.

30.2.2 Storage Considerations

The Disaster Recovery solution for a Cloud Control deployment involves installing the Software Library, OMS installation, Agent installation and Oracle inventory on replicated storage.

Storage Replication Requirements

Your chosen method of storage replication should support the following:

  • Snapshots and consistent filesystem copies

  • Ability to perform scheduled and on-demand replication between sites

The following section details the storage structure recommended by Oracle.

  • Create one volume per OMS host.

  • Mount the above volumes to each OMS host using the same mount point e.g. /em. On each host, this volume would contain the OMS installation, Agent installation and Oracle inventory.

  • Create a consistency group for the above volumes so that consistent replication can be done for all the volumes.

  • Create one volume for the software library. This volume must be mounted simulteneously to all the OMS hosts using the same mount point e.g. /swlib.

  • Decide on appropriate replication frequency for the OMS filesystems and software library based on your infrastructure. Oracle recommends a minimum frequency of 24 hours for the OMS filesystem and continous or hourly replication for the software library.

Example: The following table shows an example configuration.

Table 30-2 Storage Configuration

Volume Mounted on Host Mount Point Comments

VOLOMS1

oms1-p.example.com

/em

Installation of Enterprise Manager on Primary Site OMS1

VOLOMS2

oms2-p.example.com

/em

Installation of Enterprise Manager on Primary Site OMS2

VOLSWLIB

oms1-p.example.com and oms2-p.example.com

/swlib

Software library on Primary Site OMS1 and OMS2


30.2.3 Database Considerations

This section provides the recommendations and considerations for setting up Repository databases for Disaster Recovery.

  • Oracle recommends creating Real Application Cluster databases on both the production site and standby site.

  • The Oracle Data Guard configuration used should be decided based on the data loss requirements of the database as well as the network considerations such as the available bandwidth and latency when compared to the redo generation. Make sure that this is determined correctly before setting up the Oracle Data Guard configuration.

  • To enable Data Guard to restart instances during the course of broker operations, a service with a specific name must be statically registered with the local listener of each instance.

  • It is strongly recommended to force Data Guard to perform manual database synchronization whenever middle tier synchronization is performed. This is especially true for components that store configuration data in the metadata repositories.

  • It is strongly recommended to set up aliases for the database host names on both the production and standby sites. This enables seamless switchovers, switchbacks and failovers. For example:

    Site         Repository Host Names                  Repository Connect String
    Primary      repos1-p.example.com,                  (DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=repos1-p.example.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=repos2-p.example.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=EMREP)))
                 repos2-p.example.com
     
    Standby      repos1-s.example.com,                  (DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=repos1-s.example.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=repos2-s.example.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=EMREP)))
                 repos2-s.example.com
    

In the above example, after a failover or switchover operation, the OMS on the standby site must be switched to use the standby repository connection string. You can avoid changing of connect strings by optionally setting up a hostname alias for the repository database hosts. For example:

Site         Repository Host Names             Host Name Alias
Primary      repos1-p.example.com,             repos1.example.com
             repos2-p.example.com              repos2.example.com
 
Standby      repos1-s.example.com,             repos1.example.com
             repos2-s.example.com              repos2.example.com

Thus the connect string on each site can be the same, alleviating the need to do a change during failover or switchover.

(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=repos1.example.com)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=repos2.example.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=EMREP))).

30.2.4 Starting Points

Before setting up the standby site, the administrator must evaluate the starting point of the project. The starting point for designing an Enterprise Manager Cloud Control Disaster Recovery topology is usually one of the following:

  • The primary site is already created, standby site is being planned

  • The primary site is already created, standby site is already created using the deprecated "Standby WLS Domain" method

  • No installation exists, both primary and standby sites are being planned

30.2.4.1 The primary site is already created, standby site is being planned

When the starting point is an existing primary site, the OMS installation for the primary site already exist on the file system. Also, the host names, ports, and user accounts are already defined. The following procedure must be used to transform the site and prepare it for Disaster Recovery topology.

  1. Review the Network Considerations and plan your host names

    If using option 1, no host name changes are required on the primary site. Prepare your standby site hosts by adding appropriate alias host names.

    If using option 2, change the OMS host name (see "Multiple OMS, SLB, First OMS Host Name Change" to move your existing OMS installation to use alias host names. Prepare your standby site hosts by adding the appropriate alias host names.

  2. Review the Storage Considerations and move your OMS installation to shared storage

    Migrate the primary site to shared storage. See "Migrating an Existing Site to Shared Storage".

  3. Review the Database considerations and plan your repository host names and connect descriptors

    To achieve seemless failover/switchover consider if you want to use hostname alias for the repository database. If so, migrate your repository database to use alias hostname. See "Repository Host Name Change".

  4. Now that your primary site is ready, use the procedures in ""Setting Up Management Repository Disaster Recovery" and "Setting Up Management Service and Software Library Disaster Recovery" to complete the DR setup.

30.2.4.2 The primary site is already created, standby site is already created using the deprecated "Standby WLS Domain" method.

  1. Use the deleting standby OMS procedure to delete the Standby OMS. See Removing Additional Standby OMS Instances in the Enterprise Manager Advanced Installation and Configuration Guide.

  2. Use the 28.2.4.1 procedure documented in "The primary site is already created, standby site is being planned".

30.2.4.3 No installation exists, both primary and standby sites are being planned

When you are designing a new primary site (not using a pre-existing primary site), its easier as the site planning can be done before starting the installation of software.

  1. Review the Network Considerations and plan your host names.

  2. Review the Storage Considerations and prepare your storage volumes.

  3. Review the Database Considerations and prepare your repository host names.

  4. Now do your primary site installation using the procedures in Chapter 29, "Enterprise Manager High Availability," taking care to using the correct host names and installing on the shared storage.

  5. Now that your primary site is ready, use the "Setup Standby Repository" and "Setup Standby OMS and Software Library using storage replication" procedures to complete the DR setup.

30.3 Setting Up Management Repository Disaster Recovery

The Management Repository should use Data Guard as a Disaster Recovery solution.

30.3.1 Configuring a Standby Database for the Management Repository

The following steps describe the procedure for setting up a standby Management Repository database.

  1. Prepare Standby Management Repository hosts for Data Guard.

    Install a Management Agent on each of the standby Management Repository hosts. Configure the Management Agents to upload by the SLB on the primary site. Install Grid infrastructure and RAC Database software on the standby Management Repository hosts. The version used must be the same as that on the primary site.

  2. Prepare the primary Management Repository database for Data Guard.

    If the primary Management Repository database is not already configured, enable archive log mode, setup flash recovery area and enable flashback database on the primary Management Repository database.

  3. Create the Physical Standby Database.

    Use the Enterprise Manager console to set up a physical standby database in the standby environment. The Standby Management Repository database must be a Physical Standby. Logical standby Management Repository databases are not supported.

    The Enterprise Manager Console does not support creating a standby RAC database. If the standby database has to be RAC, configure the standby database using a single instance and then use the 'Convert to RAC' option from the Enterprise Manager Console to convert the single instance standby database to RAC. Note that the Convert to RAC option is available for Oracle Database releases 10.2.0.5, 11.1.0.7, and above. Oracle Database release 11.1.0.7 requires patch 8824966 for the Convert to RAC option to work.During single instance standby creation, best practice is to create the database files on shared storage, ideally ASM, to facilitate conversion to RAC later.

  4. Add Static Service to the Listener.

    To enable Data Guard to restart instances during the course of broker operations, a service with a specific name must be statically registered with the local listener of each instance. The value for the GLOBAL_DBNAME attribute must be set to a concatenation of <db_unique_name>_DGMGRL.<db_domain>. For example, in the LISTENER.ORA file:

    SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=sid_name)
    (GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
    (ORACLE_HOME=oracle_home)))
    
  5. Enable Flashback Database on the Standby Database.

    To allow re-instate of an old primary database as a standby database after a failover, flashback database must be enabled. Hence do so for both the primary and the standby databases.

  6. To allow Enterprise Manager to monitor a Physical Standby database (which is typically in a mounted state), specify sysdba monitoring privileges. This can be specified either during the Standby creation wizard itself or post creation by modifying the Monitoring Configuration for the standby database target.

  7. Verify the Physical Standby

    Verify the Physical Standby database through the Enterprise Manager Console. Click the Log Switch button on the Data Guard page to switch log and verify that it is received and applied to the standby database.

30.4 Setting Up Management Service and Software Library Disaster Recovery

The Disaster Recovery solution for a Cloud Control deployment involves installing the Software Library, OMS installation, Agent installation and Oracle inventory on replicated filesystem.

Standby OMSs implemented using Standby WebLogic Domain are still supported but have been deprecated and may be desupported in a future release (see My Oracle Support Note 1563541.1 for details). The recommended method for creating Standby OMSs is to use storage replication as documented in this chapter. Creating standby OMSs using a Standby WebLogic Domain is documented in Appendix D, "Standby OMSs using Standby WebLogic Domain."

Storage Replication Requirements

Your chosen method of storage replication should support the following:

  • Snapshots and consistent filesystem copies

  • Ability to perform an on-demand replication between sites

30.4.1 Management Service Disaster Recovery

  1. Ensure that the primary OMS host names are resolvable to the IP addresses of the corresponding standby hosts at the standby site. This can be achieved in either of the following ways:

    • By installing OMSs at the primary site using physical host names and configuring aliases for these host names on the corresponding hosts at the standby site.

    • By installing each OMS using an alias host name that can be configured at both the primary and standby sites.

    Host name resolution at each site can be done using either local resolution (/etc/hosts) or DNS based resolution or a combination of both.

    Example /etc/hosts configurations when OMSs are installed at primary site using primary site physical host names (oms1-p.example.com and oms2-p.example.com):

    Primary Site

    127.0.0.1     localhost.localdomain
    123.1.2.111   oms1-p.example.com  oms1-p #OMS1
    123.1.2.112   oms2-p.example.com  oms2-p #
    

    Standby Site

    127.0.0.1     localhost.localdomain
    123.2.2.111   oms1-s.example.com  oms1-s oms1-p.example.com #OMS1
    123.2.2.112   oms2-s.example.com  oms2-s oms2-p.example.com #OMS2
    

    Example /etc/hosts configuration when OMSs are installed using alias host names (oms1.example.com and oms2.example.com):

    Primary Site

    127.0.0.1     localhost.localdomain
    123.1.2.111   oms1-p.example.com  oms1-p oms1.example.com #OMS1
    123.1.2.112   oms2-p.example.com  oms2-p oms2.example.com #OMS2
    

    Standby Site

    127.0.0.1    localhost.localdomain
    123.2.2.111  oms1-s.example.com  oms1-s oms1.example.com #OMS1
    123.2.2.112  oms2-s.example.com  oms2-s oms2.example.com #OMS2
    

    If the network has been configured correctly, a ping of the OMS host name from the primary site should result in a reply from the primary host, and a ping of the OMS host name from the standby site should result in a reply from the standby host.

    Example

    Ping results from primary site (reply from primary site):

    [oracle@oms1-p ~]$ ping oms1-p.example.com
    PING oms1-p.example.com (123.1.2.111) 56(84) bytes of data.
    64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=1 ttl=64 time=0.018 ms
    64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=2 ttl=64 time=0.020 ms
    64 bytes from oms1-p.example.com (123.1.2.111): icmp_seq=3 ttl=64 time=0.022 ms
    

    Ping results from standby site (reply from standby site)

    [oracle@oms1-s ~]$ ping oms1-p.example.com
    PING oms1-s.example.com (123.2.2.111) 56(84) bytes of data.
    64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=1 ttl=64 time=0.018 ms
    64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=2 ttl=64 time=0.020 ms
    64 bytes from oms1-s.example.com (123.2.2.111): icmp_seq=3 ttl=64 time=0.022 ms
    
  2. Ensure that the OMS installation, Agent Installation and Oracle Inventory for each OMS at the primary site is placed on replicated storage. This can either be done by specifying replicated storage during OMS installation or by moving these components onto replicated storage after installation.

    Note:

    If the components are moved to shared storage after installation they must retain their original pathnames.
  3. Configure an application virtual host name in DNS to point to Primary site.

    • If there is a single OMS at the primary site the DNS entry for the application virtual host name should point to this OMS.

    • If there are multiple OMSs at the primary site the DNS entry for the application virtual host name should point to the SLB.

    • This host name should be configured with a short TTL value (30-60 seconds) so that it will not be cached by clients for extended periods.

  4. Configure SLB at the standby site (only required if multiple OMSs are required at the standby site). See "Configuring a Load Balancer" for more information. The SLB pools on the standby site will reference the IP addresses of the standby OMS hosts.

  5. Resecure all Agents and OMSs using application virtual host name.

    Examples

    For OMS

    emctl secure oms -sysman_pwd <sysman_pwd> 
      -reg_pwd <agent_reg_password> 
      -host em.example.com 
      -secure_port 4900 
      -slb_port 4900 
      -slb_console_port 443  
      -console
      -lock_upload  -lock_console
    

    For Agent

    emctl secure agent -emdWalletSrcUrl https://em.example.com:4901/em
    
  6. Configure the storage replication schedule for as frequently as the network infrastructure will allow (minimum every 24 hours).

    Note:

    Refer to your storage/network documentation to determine a replication schedule that maximizes the resource utilization performance of your network infrastructure.
  7. Move HTTP Lock files to local filesystem. See the Enterprise Manager Cloud Control Advanced Installation and Configuration Guide for more information.

30.4.2 Monitoring Standby OMS Hosts

Monitoring the availability of the standby OMS hosts is necessary to ensure that they are ready for switchover/failover operations. In order to monitor these hosts, Agents should be deployed to local filesystems on each standby OMS host.To avoid conflicts with the components that will be started on the standby site after a switchover/failover, when deploying Agents on the standby OMS hosts the following points should be considered:

  • The Agents deployed to the standby OMS hosts should not use the replicated Oracle Inventory. They should be installed using a local inventory that does not include the replicated OMS and Agent installs.

  • The Agents deployed to the standby OMS hosts should be deployed on a different port to that used by the replicated Agents. This will avoid port conflicts when the replicated OMS and Agent are started on the standby OMS host.

  • Regardless of which network topology is used (aliases at both sites or aliases only at the standby site), these Agents should be deployed using the physical hostnames of the standby OMS hosts.

To specify an inventory location for Agent installation, an inventory pointer file can be created and the -invPtrLoc flag can be used during installation.

The following example shows an inventory pointer file that specifies the inventory location as /u01/oraInventory_standby

Example 30-1 Inventory Pointer File

more /u01/oraInst_standby.loc

inventory_loc=/u01/oraInventory_standbyinst_group=dba

The -invPtrLoc flag can then be passed during Agent installation.

monitoring a standby oms

30.4.3 Software Library Disaster Recovery

  1. The Software Library should be located on a filesystem that is replicated using storage replication. If the Software Library is currently located on another filesystem it can be migrated using the 'Migrate and Remove' option in the Software Library Administration page.

    See Chapter 14, "Configuring a Software Library" for more information.

  2. Configure the storage replication schedule for as frequently as the network infrastructure as the network infrastructure will allow. Oracle recommends continuous replication occur every 2 hours (minimum).

30.4.4 Migrating an Existing Site to Shared Storage

  • - Use file system backups to move existing OMS and agent installations to shared storage.

  • - Use the following guidelines to migrate from local file system to shared storage

    • All backups must be offline backups, i.e. OMS and agent processes on a host must be shut down completed before backing up and restoring.

    • The backups must be performed as root user and permissions must be preserved.

    • The directory paths for Middleware Home and Instance Home must not change. If required use symbolic links to keep the paths the same.

    • The directory paths for Middleware Home and Instance Home must not change. If required use symbolic links to keep the paths the same.

    • The migration can be done in a rolling fashion to avoid complete downtime of Cloud Control.

  • Use "Migrate Swlib Storage Location" [Link to Software library chapter] to move the software library to shared storage.

30.5 Performing Switchover and Failover Operations

Activating the standby site can take place either by using a switchover or a failover. These are used in different situations as described below:

  • Switchover - A pre-planned role reversal of the primary and standby sites. In a switchover, functionality is transferred from the primary site to a standby site in an orderly, coordinated operation. As such, both sites must be available for a switchover to complete. Switchover is usually performed for testing and validation of Disaster Recovery (DR) scenarios and for planned maintenance activities on the primary infrastructure. A switchover is the preferred method of activating the standby site as the primary.

  • Failover - Activation of the standby site as the primary site when the original primary site becomes unavailable.

30.5.1 Switchover Procedure

This section describes the steps to switchover to the standby site. The same procedure is applied to switchover in either direction.

  1. Shut down all OMS components at the primary site.

  2. Shut down all virtual Management Agents at the primary site.

  3. Unmount the OMS filesystem and the software library filesystems from OMS hosts at the primary site.

  4. Perform on-demand replication of OMS and software library filesystems.

    Note:

    Refer to your storage documentation for steps required to perform an on-demand replication.
  5. Update DNS entry for the application virtual hostname.

  6. Switchover Oracle Database using Data Guard switchover.

    Use DGMGRL to perform a switchover to the standby database. The command can be run on the primary site or the standby site. The switchover command verifies the states of the primary database and the standby database, affects switchover of roles, restarts the old primary database, and sets it up as the new standby database.

    SWITCHOVER TO <standby database name>;

    Verify the post switchover states. To monitor a standby database completely, the user monitoring the database must have SYSDBA privileges. This privilege is required because the standby database is in a mounted-only state. A best practice is to ensure that the users monitoring the primary and standby databases have SYSDBA privileges for both databases.

    SHOW CONFIGURATION;

    SHOW DATABASE <primary database name>;

    SHOW DATABASE <standby database name>;

  7. Perform role reversal of the Software Library and OMS storage (refer to your storage documentation for instructions).

  8. Re-enable replication schedules for SWLIB and OMS storage

  9. Mount OMS and Software Library filesystems on OMS hosts at Standby site

  10. Start the first OMS Admin Server at the standby site

  11. Point OMS to new Primary Repository Database using the following command:

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor> -repos_user <username>
    

    Example

    emctl config oms -store repos_details -repos_conndesc '"(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=newscan.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emreps.domain)))"' -repos_user SYSMAN
    
  12. Start the OMS at the standby site.

  13. Start the Management Agents at the standby site using the following command:

    emctl start agent
    
  14. Relocate Management Services and Repository target using the following command:

    emctl config emrep -agent <agent name> -conn_desc <repository connection>
    

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

  15. Update the WebLogic Admin Server URL. This is done by navigating to the target homepage for GC Domain and selecting WebLogic Domain-->Target Setup--> Monitoring Configuration from within Cloud Control.

30.5.2 Failover Procedure

This section describes the steps to failover to the standby site, recover the Enterprise Manager application state by resynchronizing the Management Repository database with all Management Agents, and finally enabling the original primary database

  1. Shut down all OMS components at the primary site if running.

  2. Shut down all virtual agents at primary site if running.

  3. Unmount OMS and Software Library filesystems from OMS hosts at primary site.

  4. Perform on-demand replication of the OMS and Software Library filesystems.

    Note:

    Refer to your storage documentation for steps required to perform an on-demand replication.
  5. Update the DNS entry for the application virtual hostname.

  6. Failover Oracle Database using Data Guard failover.

  7. Perform role reversal of Software Library and OMS storage.

  8. Re-enable replication schedules for SWLIB and OMS storage

  9. Mount the OMS and Software Library filesystems on OMS hosts at the standby site

  10. Start first OMS admin server

  11. Point the OMS to new Primary Repository Database using the

    emctl config oms -store_repos_details -repos_conndesc <connect descriptor> -repos_user <username>
    

    Example

    emctl config oms -store repos_details -repos_conndesc '"(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=newscan.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emreps.domain)))"' -repos_user SYSMAN
    
  12. Start the OMSs.

  13. Resync the Agents. You can perform this operation either through the Cloud Control console (Management Agent homepage) or using EM CLI. If the Agent is not blocked, then the resync should be performed using the following command:

    emcli resync agent <agenthost:port>

  14. Start the Agents.

  15. Relocate Management Services and Repository target using the following command:

    emctl config emrep -agent <agent name> -conn_desc

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

  16. Update WebLogic Admin Server URL. This is done by navigating to the target homepage for GC Domain and selecting WebLogic Domain-->Target Setup-->Monitoring Configuration from within Cloud Control.

30.6 Keeping the Standby Site in Sync with the Primary

The standby site will be kept in sync with the primary automatically through the combination of Data Guard and storage replication.

The administrator should ensure that an on-demand replication to the standby site takes place before and after the following operations on the OMS or the agent:

  • Plug-in deployment/undeployment, or existing plug-in upgrade

  • Upgrade

  • Patch

  • emctl commands (other than lifecycle verbs (start/stop/status oms))

  • Configuration of ADP/JVMD/BI Publisher

Note:

Refer to your storage documentation for steps required to perform an on-demand replication.