6 Enabling Continuous Service for Applications

Applications achieve continuous service easily when the underlying network, systems, and databases are always available.

To achieve continuous service in the face of unplanned outages and planned maintenance activities can be challenging. An MAA database architecture and its configuration and operational best practices is built upon redundancy and its ability to tolerate, prevent, and at times auto-repair failures.

However, applications can incur downtime whenever a failure hits a database instance, a database node, or the entire cluster or data center. Similarly, some planned maintenance activities may require restarting a database instance, a database node, or an entire database server to be restarted.

In all cases, following a simple checklist, your applications can incur zero or very little downtime whenever the database service that the application is connected to can be moved to another Oracle RAC instance or to another database.

To achieve continuous application uptime during Oracle RAC switchover or failover events, follow these application configuration best practices:

  • Use non-default Oracle Clusterware-managed services to connect your application.

  • Use recommended connection strings with built-in timeouts, retries, and delays, so that incoming connections do not see errors during outages.

  • Configure your connections with Fast Application Notification.

  • Drain and relocate services before any planned maintenance requiring an Oracle RAC instance restart.

    Software updates to Exadata Database Host or Exadata Database Guest automatically drain and relocate services. Oracle Cloud and Fleet Patching and Provisioning (FPP) drain and relocate services automatically for Oracle Database, Grid Infrastructure, and Exadata software updates.

  • Leverage Application Continuity or Transparent Application Continuity to replay in-flight uncommitted transactions transparently after failures.

Depending on the planned maintenance event, Oracle attempts to automatically drain and relocate application services before restarting any Oracle RAC instance. For OLTP applications, draining and relocating services works very well and results in zero application downtime.

Some applications such as long running batch jobs or reports may not be able to drain and relocate gracefully or within the maximum drain timeout. For those applications, Oracle recommends scheduling the software planned maintenance window that contains Oracle RAC rolling activities to exclude these types of activities by picking a window that will not conflict with these activities, or stopping these activities before the planned maintenance window. For example, you can reschedule a planned maintenance window outside your batch windows, or stop challenging batch jobs or reports before a planned maintenance window.

The following table outlines planned maintenance events that will incur Oracle RAC instance rolling restart and the relevant service drain timeout variables that may impact your application.

Table 6-1 Drain Timeout Variables for Planned Maintenance Events

Planned Maintenance Event Drain Timeout Variables
Exadata Database Host (Dom0) software changes

Exadata Host handles operating system (OS) shutdown with maximum timeout of 10 minutes.

OS shutdown calls an rhphelper, which has the following drain timeout settings:

  • DRAIN_TIMEOUT: value used for services that do not have a drain_timeout defined. Default 180
  • MAX_DRAIN_TIMEOUT: overrides any higher drain_timeout value defined for a given service. Default 300

Each Clusterware-managed service is also controlled by a drain_timeout attribute that can be lower than the above values.

See also: Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata (Doc ID 2385790.1)

Exadata Database Guest (DomU) software changes

Exadata patchmgr and dbnodeupdate software programs call rhphelper, which has the following drain timeout settings:

DRAIN_TIMEOUT: value used for services that do not have a drain_timeout defined. Default 180

MAX_DRAIN_TIMEOUT: overrides any higher drain_timeout value defined for a given service. Default 300

Each Clusterware-managed service is also controlled by a drain_timeout attribute that can be lower than the above values.

See also: Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata (Doc ID 2385790.1)

Oracle Grid Infrastructure (GI) software changes or upgrade

The recommend steps are described in Graceful Application Switchover in RAC with No Application Interruption (Doc ID 1593712.1).

Example:

srvctl stop instance -o immediate -drain_timeout 600 -failover -force

Each Clusterware-managed service is also controlled by a drain_timeout attribute that can be lower than the above values.

Oracle Database Software changes

The recommend steps are described in Graceful Application Switchover in RAC with No Application Interruption (Doc ID 1593712.1).

Example:

srvctl stop instance -o immediate -drain_timeout 600 -failover -force

Each Clusterware-managed service is also controlled by a drain_timeout attribute that can be lower than the above values.

For more information, see Application Checklist for Continuous Service for MAA Solutions for recommendations to experience application-level service uptime similar to that of the database uptime. Oracle recommends testing your application readiness by following the recommendations in Validating Application Failover Readiness (Doc ID 2758734.1).