33 High Availability Solutions

Highly Available systems are critical to the success of virtually every business today. It is equally important that the management infrastructure monitoring these mission-critical systems are highly available. The Enterprise Manager Cloud Control architecture is engineered to be scalable and available from the ground up. It is designed to ensure that you concentrate on managing the assets that support your business, while it takes care of meeting your business Service Level Agreements.

When you configure Cloud Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.

Maximum Availability Architecture (MAA) provides a highly available Enterprise Manager implementation by guarding against failure at each component of Enterprise Manager.

The impacts of failure of the different Enterprise Manager components are:

Management Agent failure or failure in the communication between Management Agents and Management Services

Results in targets no longer monitored by Enterprise Manager, though the Enterprise Manager console is still available and one can view historical data from the Management Repository.
Management Service failure

Results in the unavailability of Enterprise Manager console, as well as unavailability of almost all Enterprise Manager services.
Management Repository failure

Results in failure on the part of Enterprise Manager to save the uploaded data by the Management Agents as well as unavailability of almost all Enterprise Manager services.

Overall, failure in any component of Enterprise Manager can result in substantial service disruption. Therefore it is essential that each component be hardened using a highly available architecture.

This chapter covers the following topics:

Latest High Availability Information
Defining High Availability
Determining Your High Availability Needs
Comparing Availability Levels
Implementing High Availability Levels

Latest High Availability Information

Because of rapidly changing technology, and the fact that high availability implementations extend beyond the realm of Oracle Enterprise Manager, the following resources should be checked regularly for the latest information on third-party. integration with Oracle's high availability solutions (F5 or third-party cluster ware, for example).

Oracle Maximum Availability Architecture Website

http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
Support Note 330072.1: "How To Configure Grid Control Components for High Availability "

Defining High Availability

Oracle Enterprise Manager's flexible, distributed architecture permits a wide range of deployment configurations, allowing it to meet the monitoring and management needs of your business, as well as allowing for expansion as business needs dictate.

For this reason, high availability for Enterprise Manager cannot be narrowly defined as a singular implementation, but rather a range of protection levels based on your available resources, Oracle technology and best practices that safeguard the investment in your IT infrastructure. Depending on your Enterprise Manager deployment and business needs, you can implement the level of high availability necessary to sustain your business. High availably for Enterprise Manager can be categorized into four levels, each level building on the previous and increasing in implementation cost and complexity, but also incrementally increasing the level of availability.

Levels of High Availability

Each high availability solution level is driven by your business requirements and available IT resources. However, it is important to note that the levels represent a subset of possible deployments that are useful in presenting the various options available. Your IT organization will likely deploy its own configuration which need not exactly match one of the levels.

The following table summarizes the four example high availability levels for Oracle Enterprise Manager installations as well as general resource requirements.

Table 33-1 Enterprise Manager High Availability Levels

Level	Description	Minimum Number of Nodes	Recommended Number of Nodes	Load Balancer Requirements
Level 1	OMS and repository database each on their own host with no failover.	1	2	None
Level 2	OMS installed on shared storage with a VIP based failover database using Local Data Guard.	2	4	None
Level 3	OMS in Active/Active configuration database using RAC + Local Data Guard	2	5	Local Load Balancer
Level 4	OMS in Active/Active configuration on the primary site standby RAC database (DataGuard) at the disaster recovery site.Multiple standby OMS's at remote site.Data Guard RAC database at the primary site Note: Level 4 is a MAA Best Practice, achieving highest availability in the most cost effective, simple architecture.	4	8	Required: Local Load Balancer for each site. Optional: Global Load Balancer

Determining Your High Availability Needs

As previously mentioned, the availability level you choose depends on factors such as the hardware resources available and the business need of your organization. However, developing your high availability plan in a way that objectively encompasses all aspects of your high availability needs (hardware, business processes, effort, cost) can be problematic. The solution is to define high availability needs in terms of Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

Recovery Time Objective - The period of time within which your business process or technological resources must be restored after failure. Key Question: How fast do your business processes/resources need to be running again before the bottom line is impacted?
Recovery Point Objective - The period of time between the time of failure and the last backup. Key Question: How much data are you willing to loose?

Defining your high availability needs in terms of RTO and RPO allows you to effectively meet the demands of users. Both values should be determined using the worst-case scenarios.

Comparing Availability Levels

Given the broad range of factors that must be taken into consideration when implementing a highly available Enterprise Manager environment, your ultimate decision will be based on the interrelationship between RTO, RPO and the cost involved with implementing one of the availability levels. The following table shows the interrelationship between these factors.

Table 33-2 Comparison of High Availability Levels

Level	RTO	RPO	Build Time	Cost
Level 1	98.0%	Hours	Hours to Days	$
Level 2	98.8%	Minutes	Hours to Days	$$
Level 3	99.9%	Minutes to Seconds	Days	$$$
Level 4	99.9%	Minutes to Seconds	Days	$$$$

The table is not a prescriptive recommendation for choosing a high availability level, but instead should be used to aid your decision making process based on your business needs. For example, you have an uptime requirement of 95% and a desired mean time to recovery of seconds, the you should select level four.What is not reflected in the table are such factors as survivability and scalability. Hence, although the differences between level three and level four seem outwardly insignificant, there are differences. If you need survivability in the event of a primary site loss you need to go with a Level 4 architecture. If you need equalized performance in the event of site loss it's essential. A level three architecture with DG that's asymmetrically scaled will mean degradation in performance when activated.If you need to maintain performance levels you will need for level 4 with a symmetrically sized architecture on both sites. This is particularly true if you want to run through planned failover routines where you actively run on the primary or secondary site for extended periods of time. For example, some finance institutions mandate this as part of operating procedures.

The following tables compare the protection levels and recovery times for the various high-availability levels.

Table 33-3 High Availability Levels of Protection

Level	OMS Host Failure	OMS Storage Failure	Database Host Failure	Database Storage Failure	Site Failure
Level 1	No	No	No	No	No
Level 2	Yes	No	Yes	Yes	No
Level 3	Yes	Yes	Yes	Yes	No
Level 4	Yes	Yes	Yes	Yes	Yes

Table 33-4 High Availability Level Recovery Times

Level	Node Failure	Local Storage Failure	Site Failure	Cost
Level 1	Hours-Days	Hours-Days	Hours-Days	$
Level 2	Minutes	Hours-Days	Hours-Days	$$
Level 3	No Outage	Minutes	Hours-Days	$$$
Level 4	No Outage	Minutes	Minutes	$$$$

Implementing High Availability Levels

Once you have determined the high availability requirements for your enterprise, you are ready to begin implementing one of the high availability levels that is suitable for your environment. Use the following information roadmap to find implementation instructions for each level.

Level	Where to find information
Level 1	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide
Level 2	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Installing Multiple OMSs in Active/Active configuration Configuring Standby Database for the Enterprise Manager Repository
Level 3	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Installing Multiple OMSs in Active/Active configuration Configuring the First Management Service for High Availability Configuring Additional Management Services Configuring Software Library Configuring a Load Balancer Reconfiguring the Oracle Management Agent Converting the Enterprise Manager Repository from Single Instance to RAC Configuring Standby Database for the Enterprise Manager Repository
Level 4	Oracle Enterprise Manager Basic Installation Guide and the Oracle Enterprise Manager Advanced Installation and Configuration Guide PLUS Installing Multiple OMSs in Active/Active configuration Configuring the First Management Service for High Availability Configuring Additional Management Services Configuring Shared File System Loader Configuring Software Library Configuring a Load Balancer Reconfiguring the Oracle Management Agent Configuring Standby Management Service Converting the Enterprise Manager Repository from Single Instance to RAC Configuring Standby Database for the Enterprise Manager Repository