1 Introduction to High-Availability Best Practices

This chapter describes how using Oracle high-availability best practices can increase availability to the Oracle database as well as the entire technology stack. This chapter contains the following topics:

1.1 Oracle Database High-Availability Architecture

Choosing and implementing the architecture that best fits the availability requirements of a business can be a daunting task. This architecture must encompass appropriate redundancy, provide adequate protection from all types of outages, ensure consistent high performance and robust security, while being easy to deploy, manage, and scale. Needless to mention, this architecture should be driven by well-understood business requirements. Choosing and implementing a high-availability architecture is covered in Oracle Database High Availability Overview.

Before using the best practices presented in this book, your organization should have already chosen a high-availability architecture for your database as described in Oracle Database High Availability Overview. If you have not already done so, then refer to that document to learn about the high-availability solutions that Oracle offers for Oracle Database before proceeding with this book.

1.2 Oracle Database High-Availability Best Practices

To build, implement and maintain a high-availability architecture, a business needs high-availability best practices that involve both technical and operational aspects of its IT systems and business processes. Such a set of best practices removes the complexity of designing a high-availability architecture, maximizes availability while using minimum system resources, reduces the implementation and maintenance costs of the high-availability systems in place, and makes it easy to duplicate the high-availability architecture in other areas of the business. An enterprise with a well-articulated set of high-availability best practices that encompass high-availability analysis frameworks, business drivers and system capabilities, will enjoy an improved operational resilience and enhanced business agility.

Building, implementing, and maintaining a high-availability architecture for Oracle Database using high-availability best practices is the purpose of this book. By using the Oracle Database high-availability best practices described in this book, you will be able to:

1.3 Oracle Maximum Availability Architecture

Oracle Maximum Availability Architecture (MAA) is an Oracle best practices blueprint based on proven Oracle high-availability technologies and recommendations. The high-availability best practices described in this book make up one of several components of MAA. MAA involves high-availability best practices for all Oracle products across the entire technology stack—Oracle Database, Oracle Application Server, Oracle Applications, Oracle Collaboration Suite, and Oracle Grid Control.

Some of the key features of MAA include:

  • Considers various business service level agreements (SLA) to make high-availability best practices as widely applicable as possible

  • Leverages database grid servers and storage grid with low-cost storage to provide highly resilient, lower cost infrastructure

  • Uses results from extensive performance impact studies for different configurations to ensure that the high-availability architecture is optimally configured to perform and scale to business needs

  • Gives the ability to control the length of time to recover from an outage and the amount of acceptable data loss from a natural disaster

  • Evolves with each Oracle version and is completely independent of hardware and operating system

For more information on MAA and documentation on best practices for all components of MAA, visit the MAA web site at:


1.4 Operational Best Practices

One of the best ways to reduce downtime is incorporating operational best practices. You can often prevent problems and downtime before they occur by rigorously testing changes in your test environment, following stringent change control policies to guard your primary database from harm, and having a well-validated repair strategy for each outage type.

A monitoring infrastructure such as Grid Control is essential to quickly detect problems. Having an outage and repair decision tree as well as an automated or automatic repair facility reduces downtime by eliminating or reducing decision and repair times.

The following is a list of key operational practices:

  • Document and communicate service level agreements (SLA)

  • Create test environments

    A good test environment accurately mimics the production system to test changes and prevent problems before they can affect your business.

  • Establish change control and security procedures

    Change control and security procedures maintain the stability of the system and ensure that no changes are incorporated in the primary database unless they have been rigorously evaluated on your test systems.

  • Set up and follow security best practices

    The biggest threat to corporate data comes from employees and contractors with internal access to networks and facilities. Corporate data can be at grave risk if placed on a system or database that does not have proper security measures in place. A well-defined security policy can help protect your systems from unwanted access and protect sensitive corporate information from sabotage. Proper data protection reduces the chance of outages due to security breaches.

    See Also:

  • Leverage Grid Control or another monitoring infrastructure to detect and react to potential failures and problems before they occur

    • Monitor system, network, and database statistics

    • Monitor performance statistics

    • Create performance thresholds as early warning indicators that a system or application has a problem or is underperforming

  • Leverage MAA recommended repair strategies and create an outage and repair decision tree for crisis scenarios using the recommended MAA matrix

  • Automate and optimize repair practices to minimize downtime by following MAA best practices

See Also:

Chapter 4, "Managing Outages" for more information on repair strategies and practices