|Oracle® High Availability Architecture and Best Practices
10g Release 1 (10.1)
Part Number B10726-01
This chapter contains the following sections:
Databases and the Internet have enabled worldwide collaboration and information sharing by extending the reach of database applications throughout organizations and communities. This reach emphasizes the importance of high availability (HA) in data management solutions. Both small businesses and global enterprises have users all over the world who require access to data 24 hours a day. Without this data access, operations can stop, and revenue is lost. Users, who have become more dependent upon their solutions, now demand service-level agreements from their Information Technology (IT) departments and solutions providers. Increasingly, availability is measured in dollars, euros, and yen, not just in time and convenience.
Enterprises have used their IT infrastructure to provide a competitive advantage, increase productivity, and empower users to make faster and more informed decisions. However, with these benefits has come an increasing dependence on that infrastructure. If a critical application becomes unavailable, then the entire business can be in jeopardy. Revenue and customers can be lost, penalties can be owed, and bad publicity can have a lasting effect on customers and a company's stock price. It is critical to examine the factors that determine how your data is protected and maximize the availability to your users.
Availability is the degree to which an application or service is available when, and with the functionality, users expect. Availability is measured by the perception of an application's end user. End users experience frustration when their data is unavailable, and they do not understand or care to differentiate between the complex components of an overall solution. Performance failures due to higher than expected usage create the same havoc as the failure of critical components in the solution.
Reliability, recoverability, and continuous operations are three characteristics of a highly available solution:
An HA architecture should have the following characteristics:
The importance of high availability varies among applications. However, the need to deliver increasing levels of availability continues to accelerate as enterprises re-engineer their solutions to gain competitive advantage. Most often these new solutions rely on immediate access to critical business data. When data is not available, the operation can cease to function. Downtime can lead to lost productivity, lost revenue, damaged customer relationships, bad publicity, and lawsuits.
If a mission-critical application becomes unavailable, then the enterprise is placed in jeopardy. It is not always easy to place a direct cost on downtime. Angry customers, idle employees, and bad publicity are all costly, but not directly measured in currency. On the other hand, lost revenue and legal penalties incurred because SLA objectives are not met can easily be quantified. The cost of downtime can quickly grow in industries that are dependent upon their solutions to provide service.
Other factors to consider in the cost of downtime are the maximum tolerable length of a single unplanned outage, and the maximum frequency of allowable incidents. If the event lasts less than 30 seconds, then it may cause very little impact and may be barely perceptible to end users. As the length of the outage grows, the effect grows from annoyance at a minor problem into a big problem, and possibly into legal action. When designing a solution, it is important to take into account these issues and to determine the true cost of downtime. An organization should then spend reasonable amounts of money to build solutions that are highly available, yet whose cost is justified. High availability solutions are effective insurance policies.
Oracle makes high availability solutions available to every customer regardless of size. Small workgroups and global enterprises alike are able to extend the reach of their critical business applications. With Oracle and the Internet, applications and their data are now reliably accessible everywhere, at any time.
One of the challenges in designing an HA solution is examining and addressing all the possible causes of downtime. It is important to consider causes of both unplanned and planned downtime. Unplanned downtime includes computer failures and data failures. Data failures can be caused by storage failure, human error, corruption, and site failure. Planned downtime includes system changes and data changes. Planned downtime can be just as disruptive to operations, especially in global enterprises that support users in multiple time zones.
Chapter 9, "Recovering from Outages" for a more detailed discussion of unplanned and planned outages
Choosing and implementing the architecture that best fits your availability requirements can be a daunting task. This architecture must encompass redundancy across all components, achieve fast client failover for all types of downtime, provide consistent high performance, and provide protection from user errors, corruptions, and site disasters, while being easy to deploy, manage, and scale.
This book describes several HA architectures and provides guidelines for choosing the one that is best for your requirements. It also describes practices that are essential to maintaining an HA environment regardless of the architecture that is chosen. It also describes types of outages and provides a blueprint for detecting and resolving the outages.
Knowledge of the Oracle Database server, Real Application Clusters and Oracle Data Guard terminology is required to understand the configuration and implementation details.
Chief technology officers and information technology architects can benefit from reading the following chapters:
Information technology architects can also find essential information in the following chapters:
Database administrators can find useful information in the following chapters:
Network administrators and application administrators can find useful information in the following chapters: