High Availability Solutions

22 High Availability Solutions

Highly Available systems are critical to the success of virtually every business today. It is equally important that the management infrastructure monitoring these mission-critical systems are highly available. The Enterprise Manager Cloud Control architecture is engineered to be scalable and available from the ground up. It is designed to ensure that you concentrate on managing the assets that support your business, while it takes care of meeting your business Service Level Agreements.

When you configure Cloud Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.

Maximum Availability Architecture (MAA) provides a highly available Enterprise Manager implementation by guarding against failure at each component of Enterprise Manager.

The impacts of failure of the different Enterprise Manager components are:

Management Agent failure or failure in the communication between Management Agents and Management Service

Results in targets monitored by the agent no longer being monitored by Enterprise Manager.
Management Service failure

Results in downtime for Enterprise Manager.
Management Repository failure

Results in downtime for Enterprise Manager.
Software Library Failure

Results in a sub-set of Enterprise Manager operations being unavailable. These operations include self-update and provisioning and patching operations including Agent deployment.

Overall, failure in any component of Enterprise Manager can result in substantial service disruption. Therefore it is essential that each component be hardened using a highly available architecture.

Note:

For information about setting up a high availability solution for BI Publisher, see BI Publisher High Availability.

Latest High Availability Information

Because of rapidly changing technology, and the fact that high availability implementations extend beyond the realm of Oracle Enterprise Manager, the following resources should be checked regularly for the latest information on third-party integration with Oracle's high availability solutions (F5 or third-party cluster ware, for example).

Oracle Maximum Availability Architecture Web site

HTTP://www.oracle.com/goto/maa
Enterprise Manager 13c Framework and Infrastructure Web site

HTTP://www.oracle.com/technetwork/oem/frmwrk-infra-496656.html

Defining High Availability

Oracle Enterprise Manager's flexible, distributed architecture permits a wide range of deployment configurations, allowing it to meet the monitoring and management needs of your business, as well as allowing for expansion as business needs dictate.

For this reason, high availability for Enterprise Manager cannot be narrowly defined as a singular implementation, but rather a range of protection levels based on your available resources, Oracle technology and best practices that safeguard the investment in your IT infrastructure. Depending on your Enterprise Manager deployment and business needs, you can implement the level of high availability necessary to sustain your business. High availably for Enterprise Manager can be categorized into four levels, each level building on the previous and increasing in implementation cost and complexity, but also incrementally increasing the level of availability.

Levels of High Availability

Each high availability solution level is driven by your business requirements and available IT resources. However, it is important to note that the levels represent a subset of possible deployments that are useful in presenting the various options available. Your IT organization will likely deploy its own configuration which need not exactly match one of the levels.

The following table summarizes four example high availability levels for Oracle Enterprise Manager installations as well as general resource requirements.

Table 22-1 Enterprise Manager High Availability Levels

Level	Description	Minimum Number of Nodes	Recommended Number of Nodes	Load Balancer Requirements
Level 1	OMS and repository database. Each resides on their own host with no failover.	1	2	None
Level 2	OMS installed on shared storage with a VIP based failover. Database is using Local Data Guard.	2	4	None
Level 3	OMS in Active/Active configuration. The database is using RAC + Local Data Guard	3	5	Local Load Balancer
Level 4	OMS on the primary site in Active/Active Configuration. Repository deployed using Oracle RAC. Duplicate hardware deployed at the standby site. DR for OMS and Software Library using Storage Replication between primary and standby sites. Database DR using Oracle Data Guard. Note: Level 4 is a MAA Best Practice, achieving highest availability in the most cost effective, simple architecture.	4	8	Required: Local Load Balancer for each site. Optional: Global Load Balancer

Comparing Availability Levels

The following tables compare the protection levels and recovery times for the various HA levels.

Table 22-2 High Availability Levels of Protection

Level	OMS Host Failure	OMS Storage Failure	Database Host Failure	Database Storage Failure	Site Failure/Disaster Recovery
Level 1	No	No	No	No	No
Level 2	Yes	No	Yes	Yes	No
Level 3	Yes	Yes	Yes	Yes	No
Level 4	Yes	Yes	Yes	Yes	Yes

Table 22-3 High Availability Level Recovery Times

Level	Node Failure	Local Storage Failure	Site Failure	Cost
Level 1	Hours-Days	Hours-Days	Hours-Days	$
Level 2	Minutes	Hours-Days	Hours-Days	$$
Level 3	No Outage	Minutes	Hours-Days	$$$
Level 4	No Outage	Minutes	Minutes	$$$$

One measure that is not represented in the tables is that of scalability. Levels three and four provide the ability to scale the Enterprise Manager installation as business needs grow. The repository, running as a RAC database, can easily be scaled upwards by adding new nodes to the RAC cluster and it is possible to scale the Management Service tier by simply adding more OMS servers.

If you need equalized performance in the event of failover to a standby deployment, whether that is a local standby database or a Level four standby site including a standby RAC database and standby OMS servers, it is essential to ensure that the deployments on both sites are symmetrically scaled. This is particularly true if you want to run through planned failover routines where you actively run on the primary or secondary site for extended periods of time. For example, some finance institutions mandate this as part of operating procedures.

If you need survivability in the event of a primary site loss you need to go with a Level four architecture.

Implementing High Availability Levels

Once you have determined the high availability requirements for your enterprise, you are ready to begin implementing one of the high availability levels that is suitable for your environment. Use the following information roadmap to find implementation instructions for each level.

Level	Where to find information
Level 1	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide
Level 2	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Configuring the Cloud Control OMS in an Active/Passive Environment for HA Failover Using Virtual Host Names Configuring a Standby Database for the Management Repository
Level 3	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Oracle Management Service High Availability Configuring a Load Balancer Configuring the Software Library Installing Additional Management Services Configuring a Standby Database for the Management Repository
Level 4	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Configuring a Standby Database for the Management Repository Management Service Disaster Recovery