This chapter contains the following sections:
Databases and the Internet have enabled worldwide collaboration and information sharing by extending the reach of database applications throughout organizations and communities. This reach emphasizes the importance of high availability in data management solutions. Both small businesses and global enterprises have users all over the world who require access to data 24 hours a day. Without this data access, operations can stop, and revenue is lost. Users, who have become more dependent upon their solutions, now demand service-level agreements from their Information Technology (IT) departments and solution providers. Increasingly, availability is measured in dollars, euros, and yen, not just in time and convenience.
Enterprises have used their IT infrastructure to provide a competitive advantage, increase productivity, and empower users to make faster and more informed decisions. However, with these benefits has come an increasing dependence on that infrastructure. If a critical application becomes unavailable, then the business can be in jeopardy. Revenue and customers can be lost, penalties can be owed, and bad publicity can have a lasting effect on customers and a company's stock price. It is important to examine the factors that determine how your data is protected and maximize availability to your users.
Availability is the degree to which an application, service, or function is accessible on demand. Availability is measured by the perception of an application's end user. End users experience frustration when their data is unavailable or the computing system is not performing within certain expectations, and they do not understand or care to differentiate between the complex components of an overall solution. Performance failures due to higher than expected usage create the same havoc as the failure of critical components in the solution.
Reliability, recoverability, timely error detection, and continuous operations are primary characteristics of a highly available solution:
Reliability: Reliable hardware is one component of a high availability solution. Reliable software—including the database, Web servers, and applications—is just as critical to implementing a highly available solution. A related characteristic is resilience. For example, low-cost commodity hardware, combined with software such as Oracle RAC, can be used to implement a very reliable system, because the resilience of an Oracle RAC database allows processing to continue even though individual servers may fail.
Recoverability: Because there may be many choices for recovering from a failure, it is important to determine what types of failures may occur in your high availability environment and how to recover from those failures in a timely manner that meets your business requirements. For example, if a critical table is accidentally deleted from the database, what action should you take to recover it? Does your architecture provide the ability to recover in the time specified in a service level agreement (SLA)?
Timely error detection: If a component in your architecture fails, then fast detection is essential to recover from the unexpected failure. While you may be able to recover quickly from an outage, if it takes an additional 90 minutes to discover the problem, then you may not meet your SLA. Monitoring the health of your environment requires reliable software to view it quickly and the ability to notify the database administrator of a problem.
Continuous operation: Providing the ability for continuous access to your data is essential when very little or no downtime is acceptable to perform maintenance activities. Activities, such as moving a table to another location in the database or even adding CPUs to your hardware, should be transparent to the end user in a high availability architecture.
More specifically, a high availability architecture should have the following traits:
Tolerate failures such that processing continues with minimal or no interruption
Be transparent to—or tolerant of—system, data, or application changes
Provide built-in preventative measures
Provide proactive monitoring and fast detection of failures
Provide fast recoverability
Automate detection and recovery operations
Protect the data so that there is minimal or no data loss
Implement the operational best practices to manage your environment
Achieve the goals set in SLAs (for example, recovery time (RTO) and recovery point (RPO)) for the lowest possible total cost of ownership.
The importance of high availability varies among applications. However, the need to deliver increasing levels of availability continues to accelerate as enterprises re-engineer their solutions to gain competitive advantage. Most often, these new solutions rely on immediate access to critical business data. When data is not available, the operation can cease to function. Downtime can lead to lost productivity, lost revenue, damaged customer relationships, bad publicity, and lawsuits.
It is not always easy to place a direct cost on downtime. Angry customers, idle employees, and bad publicity are all costly, but not directly measured in currency. On the other hand, lost revenue and legal penalties incurred because SLA objectives are not met can easily be quantified. The cost of downtime can quickly grow in industries that are dependent on their solutions to provide service.
Other factors to consider in the cost of downtime are the maximum tolerable length of a single unplanned outage, and the maximum frequency of allowable incidents. If the event lasts less than 30 seconds, then it may cause very little impact and may be barely perceptible to end users. As the length of the outage grows, the effect may grow exponentially and result in a negative impact on the business. Alternatively, frequent outages, even if short in duration, may similarly disrupt business operations. When designing a solution, it is important to understand the true cost of downtime to understand how the business can benefit by availability improvements.
Oracle provides a range of high availability solutions that fit every organization regardless of size. Small workgroups and global enterprises alike are able to extend the reach of their critical business applications. With Oracle and the Internet, applications and data are reliably accessible everywhere, at any time.
One of the challenges in designing a high availability solution is examining and addressing all of the possible causes of downtime. It is important to consider causes of both unplanned and planned downtime when designing a fault tolerant and resilient IT infrastructure. Planned downtime can be just as disruptive to operations, especially in global enterprises that support users in multiple time zones.
Table 1-1 describes unplanned outage types and provides examples of each type.
Table 1-1 Causes of Unplanned Downtime
Table 1-2 describes planned outage types and provides examples of each types.
Table 1-2 Causes of Planned Downtime
Type | Description | Examples |
---|---|---|
System and database changes |
Planned system changes occur when performing routine and periodic maintenance operations and new deployments. Planned system changes include any scheduled changes to the operating environment that occur outside the organizational data structure in the database. The service level impact of a planned system change varies significantly depending on the nature and scope of the planned outage, the testing and validation efforts made before implementing the change, and the technologies and features in place to minimize the impact. |
|
Data changes |
Planned data changes occur when there are changes to the logical structure or physical organization of Oracle Database objects. The primary objective of these changes is to improve performance or manageability. |
|
Application Changes |
Planned application changes may include data changes and schema and programmatic changes. The primary objective of these changes is to improve performance, manageability, and functionality. |
|
Oracle offers high availability solutions to help avoid both unplanned and planned downtime, and recover from failures. Chapter 2 discusses each of these high availability solutions in detail.
Choosing and implementing the architecture that best fits your availability requirements can be a daunting task. This architecture must:
Encompass redundancy across all components
Provide protection and tolerance from computer failures, storage failures, human errors, data corruption, lost writes, system hangs or slowdown, and site disasters
Recover from outages as quickly and transparently as possible
Provide solutions to eliminate or reduce planned downtime
Provide consistent high performance
Be easy to deploy, manage, and scale
Achieve SLA's at the lowest possible total cost of ownership
To help you select the most suitable architecture for your organization, this book describes several high availability architectures and provides guidelines for choosing the one that best meets your requirements. Knowledge of the Oracle Database server, Oracle RAC, and Oracle Data Guard terminology is required to understand the configuration and implementation details.
Chief technology officers and information technology architects can benefit from reading the following chapters:
Database administrators and network administrators can find useful information in the following chapters:
See Also:
Oracle High Availability Best Practice recommendations in the:The MAA white papers that can be downloaded from