This chapter describes how using Oracle high-availability best practices can increase availability to the Oracle database as well as the entire technology stack. This chapter contains the following topics:
Choosing and implementing the architecture that best fits the availability requirements of a business can be a daunting task. This architecture must encompass appropriate redundancy, provide adequate protection from all types of outages, ensure consistent high performance and robust security, while being easy to deploy, manage, and scale. Needless to mention, this architecture should be driven by well-understood business requirements. Choosing and implementing a high-availability architecture is covered in Oracle Database High Availability Overview.
Before using the best practices presented in this book, your organization should have already chosen a high-availability architecture for your database as described in Oracle Database High Availability Overview. If you have not already done so, then refer to that document to learn about the high-availability solutions that Oracle offers for Oracle Database before proceeding with this book.
To build, implement and maintain a high-availability architecture, a business needs high-availability best practices that involve both technical and operational aspects of its IT systems and business processes. Such a set of best practices removes the complexity of designing a high-availability architecture, maximizes availability while using minimum system resources, reduces the implementation and maintenance costs of the high-availability systems in place, and makes it easy to duplicate the high-availability architecture in other areas of the business. An enterprise with a well-articulated set of high-availability best practices that encompass high-availability analysis frameworks, business drivers and system capabilities, will enjoy an improved operational resilience and enhanced business agility.
Building, implementing, and maintaining a high-availability architecture for Oracle Database using high-availability best practices is the purpose of this book. By using the Oracle Database high-availability best practices described in this book, you will be able to:
Reduce the implementation cost of an Oracle Database high-availability system by following detailed guidelines on configuring your database, storage, application failover, backup and recovery as described in Chapter 2, "Configuring for High-Availability"
Avoid potential downtime by monitoring and maintaining your database using Oracle Grid Control as described in Chapter 3, "Monitoring Using Oracle Grid Control"
Recover quickly from unscheduled outages caused by computer failure, storage failure, human error, or data corruption as described in Chapter 4, "Managing Outages"
Eliminate or reduce downtime that might occur due to scheduled maintenance such as database patches or application upgrades as described in Chapter 4, "Managing Outages"
Oracle Maximum Availability Architecture (MAA) is an Oracle best practices blueprint based on proven Oracle high-availability technologies and recommendations. The high-availability best practices described in this book make up one of several components of MAA. MAA involves high-availability best practices for all Oracle products across the entire technology stack—Oracle Database, Oracle Application Server, Oracle Applications, Oracle Collaboration Suite, and Oracle Grid Control.
Some of the key features of MAA include:
Leverages database grid servers and storage grid with low-cost storage to provide highly resilient, lower cost infrastructure
Uses results from extensive performance impact studies for different configurations to ensure that the high-availability architecture is optimally configured to perform and scale to business needs
Gives the ability to control the length of time to recover from an outage and the amount of acceptable data loss from a natural disaster
Evolves with each Oracle version and is completely independent of hardware and operating system
One of the best ways to reduce downtime is incorporating operational best practices. You can often prevent problems and downtime before they occur by rigorously testing changes in your test environment, following stringent change control policies to guard your primary database from harm, and having a well-validated repair strategy for each outage type.
A monitoring infrastructure such as Grid Control is essential to quickly detect problems. Having an outage and repair decision tree as well as an automated or automatic repair facility reduces downtime by eliminating or reducing decision and repair times.
The following is a list of key operational practices:
Create test environments
A good test environment accurately mimics the production system to test changes and prevent problems before they can affect your business.
Establish change control and security procedures
Change control and security procedures maintain the stability of the system and ensure that no changes are incorporated in the primary database unless they have been rigorously evaluated on your test systems.
The biggest threat to corporate data comes from employees and contractors with internal access to networks and facilities. Corporate data can be at grave risk if placed on a system or database that does not have proper security measures in place. A well-defined security policy can help protect your systems from unwanted access and protect sensitive corporate information from sabotage. Proper data protection reduces the chance of outages due to security breaches.
Leverage Grid Control or another monitoring infrastructure to detect and react to potential failures and problems before they occur
Monitor system, network, and database statistics
Monitor performance statistics
Create performance thresholds as early warning indicators that a system or application has a problem or is underperforming
Leverage MAA recommended repair strategies and create an outage and repair decision tree for crisis scenarios using the recommended MAA matrix
See Also:Chapter 4, "Managing Outages" for more information on repair strategies and practices