Oracle® Enterprise Manager Cloud Control Administrator's Guide 12c Release 1 (12.1.0.1) Part Number E24473-01 |
|
|
PDF · Mobi · ePub |
Highly Available systems are critical to the success of virtually every business today. It is equally important that the management infrastructure monitoring these mission-critical systems are highly available. The Enterprise Manager Cloud Control architecture is engineered to be scalable and available from the ground up. It is designed to ensure that you concentrate on managing the assets that support your business, while it takes care of meeting your business Service Level Agreements.
When you configure Cloud Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.
Maximum Availability Architecture (MAA) provides a highly available Enterprise Manager implementation by guarding against failure at each component of Enterprise Manager.
The impacts of failure of the different Enterprise Manager components are:
Management Agent failure or failure in the communication between Management Agents and Management Services
Results in targets no longer monitored by Enterprise Manager, though the Enterprise Manager console is still available and one can view historical data from the Management Repository.
Management Service failure
Results in the unavailability of Enterprise Manager console, as well as unavailability of almost all Enterprise Manager services.
Management Repository failure
Results in failure on the part of Enterprise Manager to save the uploaded data by the Management Agents as well as unavailability of almost all Enterprise Manager services.
Overall, failure in any component of Enterprise Manager can result in substantial service disruption. Therefore it is essential that each component be hardened using a highly available architecture.
This chapter covers the following topics:
Because of rapidly changing technology, and the fact that high availability implementations extend beyond the realm of Oracle Enterprise Manager, the following resources should be checked regularly for the latest information on third-party. integration with Oracle's high availability solutions (F5 or third-party cluster ware, for example).
Oracle Maximum Availability Architecture Website
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
Support Note 330072.1: "How To Configure Grid Control Components for High Availability "
Oracle Enterprise Manager's flexible, distributed architecture permits a wide range of deployment configurations, allowing it to meet the monitoring and management needs of your business, as well as allowing for expansion as business needs dictate.
For this reason, high availability for Enterprise Manager cannot be narrowly defined as a singular implementation, but rather a range of protection levels based on your available resources, Oracle technology and best practices that safeguard the investment in your IT infrastructure. Depending on your Enterprise Manager deployment and business needs, you can implement the level of high availability necessary to sustain your business. High availably for Enterprise Manager can be categorized into four levels, each level building on the previous and increasing in implementation cost and complexity, but also incrementally increasing the level of availability.
Each high availability solution level is driven by your business requirements and available IT resources. However, it is important to note that the levels represent a subset of possible deployments that are useful in presenting the various options available. Your IT organization will likely deploy its own configuration which need not exactly match one of the levels.
The following table summarizes the four example high availability levels for Oracle Enterprise Manager installations as well as general resource requirements.
Table 33-1 Enterprise Manager High Availability Levels
Level | Description | Minimum Number of Nodes | Recommended Number of Nodes | Load Balancer Requirements |
---|---|---|---|---|
Level 1 |
OMS and repository database each on their own host with no failover. |
1 |
2 |
None |
Level 2 |
OMS installed on shared storage with a VIP based failover database using Local Data Guard. |
2 |
4 |
None |
Level 3 |
OMS in Active/Active configuration database using RAC + Local Data Guard |
2 |
5 |
Local Load Balancer |
Level 4 |
OMS in Active/Active configuration on the primary site standby RAC database (DataGuard) at the disaster recovery site.Multiple standby OMS's at remote site.Data Guard RAC database at the primary site Note: Level 4 is a MAA Best Practice, achieving highest availability in the most cost effective, simple architecture. |
4 |
8 |
Required: Local Load Balancer for each site. Optional: Global Load Balancer |
As previously mentioned, the availability level you choose depends on factors such as the hardware resources available and the business need of your organization. However, developing your high availability plan in a way that objectively encompasses all aspects of your high availability needs (hardware, business processes, effort, cost) can be problematic. The solution is to define high availability needs in terms of Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
Recovery Time Objective - The period of time within which your business process or technological resources must be restored after failure. Key Question: How fast do your business processes/resources need to be running again before the bottom line is impacted?
Recovery Point Objective - The period of time between the time of failure and the last backup. Key Question: How much data are you willing to loose?
Defining your high availability needs in terms of RTO and RPO allows you to effectively meet the demands of users. Both values should be determined using the worst-case scenarios.
Given the broad range of factors that must be taken into consideration when implementing a highly available Enterprise Manager environment, your ultimate decision will be based on the interrelationship between RTO, RPO and the cost involved with implementing one of the availability levels. The following table shows the interrelationship between these factors.
Table 33-2 Comparison of High Availability Levels
Level | RTO | RPO | Build Time | Cost |
---|---|---|---|---|
Level 1 |
98.0% |
Hours |
Hours to Days |
$ |
Level 2 |
98.8% |
Minutes |
Hours to Days |
$$ |
Level 3 |
99.9% |
Minutes to Seconds |
Days |
$$$ |
Level 4 |
99.9% |
Minutes to Seconds |
Days |
$$$$ |
The table is not a prescriptive recommendation for choosing a high availability level, but instead should be used to aid your decision making process based on your business needs. For example, you have an uptime requirement of 95% and a desired mean time to recovery of seconds, the you should select level four.What is not reflected in the table are such factors as survivability and scalability. Hence, although the differences between level three and level four seem outwardly insignificant, there are differences. If you need survivability in the event of a primary site loss you need to go with a Level 4 architecture. If you need equalized performance in the event of site loss it's essential. A level three architecture with DG that's asymmetrically scaled will mean degradation in performance when activated.If you need to maintain performance levels you will need for level 4 with a symmetrically sized architecture on both sites. This is particularly true if you want to run through planned failover routines where you actively run on the primary or secondary site for extended periods of time. For example, some finance institutions mandate this as part of operating procedures.
The following tables compare the protection levels and recovery times for the various high-availability levels.
Once you have determined the high availability requirements for your enterprise, you are ready to begin implementing one of the high availability levels that is suitable for your environment. Use the following information roadmap to find implementation instructions for each level.