High Availability in OCI

11 High Availability in OCI

High availability is a cornerstone of modern application design, ensuring systems remain operational and resilient despite unexpected failures. In environments leveraging Oracle Call Interface (OCI), a robust HA strategy is essential for delivering consistent performance, minimizing downtime, and maintaining seamless access to critical data. This chapter describes high availability (HA) features in OCI that help to configure applications to quickly and automatically shift workloads during both planned maintenance and unplanned outages. Depending on the application’s high availability requirements, you can implement the level of high availability configuration that you need.

This chapter includes the following topics:

11.1 Best Practices for Building High Availability Applications

This section outlines the best programming practices for building high-availability applications using OCI, a low-level API for interacting with the Oracle Database. It addresses the challenges and considerations for both new and existing applications, provides practical guidance for developers and system architects, regardless of the starting point.

This section covers :

Configuration Considerations for new and existing applications.
Best programming practices for Existing applications.
Best programming practices for New applications.

11.1.1 Configuration Considerations

As a starting point, no application changes are required. For both new and existing applications, follow the configuration and notifications steps described in Chapter HA Level 1 - Basic Application High Availability.

Note:

Configuration considerations are essential prerequisites for both new and existing applications.

11.1.2 Existing Applications

Enhancing the HA capabilities of existing applications requires working within the constraints of established code bases and infrastructure.

Best practices include:

Explicitly addressing the management of active and idle sessions, which can be achieved by:
- Leveraging OCI session pools or incorporating explicit OCIRequestBegin() and OCIRequestEnd() calls.
- Incrementally adding support for failover, load balancing, and session persistence, HA features can be retrofitted without a complete system overhaul.
For HA Level 2 - Handling Planned Maintenance:
- Update connection strings to use services instead of specific host names, enabling dynamic routing during maintenance.
- Retrofit the HA features to support planned maintenance.
- Use session pooling or bind each application request with OCIRequestBegin() and OCIRequestEnd().
  
  Note:
  To ensure there are no orphan calls, all OCI work needs to be within OCIRequestBegin() and OCIRequestEnd().
For more information on HA Level 2 – Handling Planned Maintenance, see High Availability Overview and Best Practices Guide -Configuring Level 2: Prepare Applications for Planned Maintenance
For HA Level 3 – Handling Unplanned Outages and Planned Failovers:
- Upgrade connection strings to include multiple database instances for automatic failover routing.
- To ensure sessions reconnect to alternate instances without impacting the users, use one of the following failover features:
  - Transparent Application Failover (TAF) support with minimal code changes to enable basic failover capabilities.
  - Application Continuity (AC) to facilitate seamless user experience by masking outages and transparently replaying database requests, maintaining transaction integrity and session state.
  - Transparent Application Continuity (TAC) to facilitate sessions reconnection to alternate instances without impacting the users
AC and TAC have additional support to replay an active transaction, following the reconnect. Configure notifications to enable seamless failover, error recovery, and efficient resource management by alerting applications of the database changes. For more information, see High Availability Overview and Best Practices - Configuring Level 3: Mask Unplanned and Planned Failovers from Applications

11.1.3 New Applications

High availability can be built into the architecture from the ground up. This allows developers to leverage the latest Oracle Database features and best practices without the constraints of legacy systems.

Best practices include:

For HA Level 2 - Handling Planned Maintenance:
- Use OCI session pooling
- Build proactive notification handling into the application.
For more information on HA Level 2 - Handling Planned Maintenance, see High Availability Overview and Best Practices Guide -Configuring Level 2: Prepare Applications for Planned Maintenance.
For HA Level 3 – Handling Unplanned Outages and Planned Failovers:
Integrate Transparent Application Failover (TAF), Application Continuity (AC), or Transparent Application Continuity (TAC), to facilitate sessions reconnection to alternate database instances without impacting the users.

For more information on HA Level 3 – Handling Unplanned Outages and Planned Failovers, see High Availability Overview and Best Practices - Configuring Level 3: Mask Unplanned and Planned Failovers from Applications

11.2 High Availability Configuration Levels

There are the following three types of High Availability configuration levels for OCI, and details of each level are given below. Depending on your application's high availability requirements, you can implement the level of high availability (HA) protection that you need:

11.2.1 HA Level 1 - Basic Application High Availability

This section describes the configuration and notifications to ensure the downtime is minimized for unplanned and planned outages. Level 1 steps do not require any modification of your applications.

At a high level, the steps to implement Level 1 are:

Step 1: Configure High Availability Database Services

Ensure that the database is HA. Refer to Configure High Availability Database Services for detailed information on configuring database services.

Step 2: Configure the Connection String for High Availability

Refer to Configure the Connection String for High Availability for detailed information on configuring connection string.

Additionally, configure the operating system TCP and Oracle Net configuration for performance and availability.

Explore the following Oracle Database Net Services sqlnet.ora file parameters:

The other Oracle Database Net Services parameters can also be useful for high availability and performance tuning:
- For example, the database’s listener.ora file have optional RATE_LIMIT and QUEUESIZE parameters that can help handle connection storms.
- With Oracle Client 18c, EXPIRE_TIME Parameter can be used in tnsnames.ora connect descriptors to prevent firewalls from terminating idle connections and to adjust keep alive timeouts (EXPIRE_TIME=n). The general recommendation for EXPIRE_TIME is to use a value that is slightly less than half of the termination period. In older versions of Oracle Client, a tnsnames.ora connect descriptor option ENABLE=BROKEN can be used instead of EXPIRE_TIME. These settings can also aid detection of a terminated remote database server.
- With Oracle Client 19c libraries, EXPIRE_TIME can be used via Easy Connect: host/service?expire_time=n. With the Oracle Client 21c libraries, it can be used in a client-side sqlnet.ora.

Step 3: Ensure Application Receives HA Notifications

Notifications about planned maintenance activities are sent in-band to applications using Oracle Client 19c or newer.
Optionally, for on-premises databases and ADB-D, enable Fast Application Notification (FAN). Refer to Ensure That FAN Is Used for more information on enabling FAN.
OCI FAN Requirements
- Database service configuration: Set the database service attribute -notification TRUE.
- FAN events notification configuration: OCI clients handle FAN events for all connections, including those that do not belong to a pool. The following two options are available:
  - Use OCI_EVENTS mode. Here is an example:
```
... OCIEnvCreate(&envhp, OCI_EVENTS, 0, 0, 0, 0, 0, 0);
```
  - Use oraaccess.xml, and set the events tag to TRUE. Here is an example:
```
<oraaccess 
      xmlns="http://xmlns.oracle.com/oci/oraaccess" 
      xmlns:oci="http://xmlns.oracle.com/oci/oraaccess" 
      schemaLocation="http://xmlns.oracle.com/oci/oraaccess 
                      http://xmlns.oracle.com/oci/oraaccess.xsd"> 
   <default_parameters> 
 	<events>TRUE</events> 
   </default_parameters> 
</oraaccess>
```
  See Also:
  - About Client-Side Deployment Parameters Specified in oraaccess.xml for more information about oraaccess.xml and details about the parameters

Step 4: Ensure Application Implements Reconnection Logic

Refer to Ensure Application Implements Reconnection Logic for detailed information on implementing reconnection logic.

11.2.2 HA Level 2 - Handle Planned Maintenance

Building on application HA Level 1 - Basic Application High Availability, level 2 adds high availability configurations for minimal application impact during planned maintenance. After implementing level 1, you are ready to implement a planned maintenance HA configuration.

Planned maintenance refers to scheduled maintenance activities aimed at keeping the database system healthy, with minimal downtime or disruption to application availability. Planned maintenance is typically initiated by the database administrator (DBA), cloud service provider (for managed services), or IT operations team.

To handle planned maintenance with minimal application impact on users, leverage OCI session pooling, session draining, or use explicit request boundaries. Employ the following practices to increase your application's high availability to level 2:

Use OCI session pooling
OCI session pools seamlessly manage connection retries and redirections during planned maintenance. If a database instance undergoes patching, OCI automatically redirects connections to another available instance, over a controlled period. OCI session pools detect when a connection has been affected by a planned outage PLANNED DOWN event and terminate the connection when it is returned to the pool. PLANNED DOWN events are generated and triggered by FAN. These events are triggered during outages to notify the session pools about the status of instances, or connections.

For applications that do not use session pooling, explicitly indicate the beginning and end of an application request by calling OCIRequestBegin() and OCIRequestEnd(), to mark when the connection is being used. This approach ensures that user operations are gracefully handled during maintenance, improving user experience by preventing disruptions or errors while maintaining system performance and reliability. In either case, OCI implicitly determines when DML statements replay is safe, and applications see fewer errors following a shutdown event.
Request boundaries
Request boundaries refer to the scope or limits within which an application request is processed. In environments using session pools, request boundaries help the pool track connection usage. The application should return the connection to the OCI session pool when the database request is completed in order to mark the end of the request boundary.

Request boundaries can be explicit or implicit. Implicit request boundaries may be automatically assumed in the session, without changing the application, at appropriate points. Explicit request boundaries integrate with AC and TAC; implicit request boundaries depend on using a TAC service. For information, see the next section HA Level 3 – Handling Unplanned Outage and Planned Failover.
Session draining
Session draining ensures active sessions are gracefully moved or closed before shutting down the instance.

See Also:

High Availability Overview and Best Practices Guide - Configuring Level 2: Prepare Applications for Planned Maintenance for more information on handling planned maintenance

By leveraging these strategies, HA environments maintain continuous service with minimal disruption, even during essential maintenance activities.

11.2.3 HA Level 3- Handle Unplanned Outage and Planned Failover

This section adds high availability configurations for masking unplanned and planned failovers from the applications and handling timeouts and outages.

An unplanned outage is an unexpected failure or disruption in the database system or its underlying infrastructure, impacting the availability of database services accessed through the Oracle Call Interface (OCI). If not managed effectively by the application or database configuration, this can lead to lost connections or failed requests.

Database Failover Features (AC, TAC, TAF)

The following Oracle database failover features offer the best way to handle unplanned outages in OCI. These features enable rebuilding and recovering the session from a known point, and then they replay interrupted in-flight work (committed transactions are not replayed). Once the replay is complete, the results of a transaction are returned to the application as if no interruption had occurred.

Application Continuity (AC)
Application Continuity ensures a seamless user experience by masking outages and transparently replaying database requests, while maintaining transaction integrity and session state.
Transparent Application Continuity (TAC)
Transparent Application Continuity automatically ensures that session state and transaction integrity are maintained during outages, with no changes required to the application.
Transparent Application Failover (TAF)
Transparent Application Failover automatically reroutes and reconnects sessions to a standby database or another instance in a RAC cluster during failures, ensuring minimal disruption to users.

Planned Failover

Planned failover refers to a controlled and intentional transition of database operations from one database instance to another, ensuring continuity of service. This controlled approach enables you to verify that the failover processes work as expected.

A planned failover leverages Oracle Database and OCI failover features to improve maintenance experience even when the application has not yet implemented draining or is unable to drain within an allocated period.

To prevent data loss during the transition, OCI ensures that the standby database is in a consistent state and fully synchronized with the primary database before initiating a failover. Once this is confirmed, OCI promotes the standby database to become the new primary. After the failover, the connection endpoints are updated either manually or automatically, depending on the configuration. This allows client applications to start directing traffic to the new primary database. Because the standby database is already synchronized, the failover occurs with minimal service interruption, ensuring a smooth transition. After the planned failover is complete, it is crucial to validate that all systems are functioning normally under the new configuration.

Once the original primary system is ready, a reverse switchover can be executed to revert operations to the original configuration.

See Also:

Ensuring Application Continuity in Oracle Real Application Clusters Administration and Deployment Guide, for detailed information about how AC, TAC, and TAF work
Configuring Level 3: Mask Unplanned and Planned Failovers from Applications for descriptions about AC, TAC, TAF

11.2.3.1 OCI and Application Continuity

Application Continuity (AC) reduces the incidence of High Availability (HA) related application errors by automatically replaying in-flight transactions during an outage to restore the application state seamlessly.

AC masks hardware, software, network, storage errors, and timeouts in a HA environment running either Oracle RAC, Oracle RAC One, or Active Data Guard for instance or site failover. AC provides support for :

SQL*Plus
Tuxedo
WebLogic Server
JDBC Type 4 (Oracle Thin)
python-oracledb (Thick)
node-oracledb (Thick)
OCI, ODPI-C drivers and applications built on OCI and ODPI-C
Oracle Data Provider for .NET (ODP.NET) managed and unmanaged drivers.

Application Continuity is recommended for OLTP applications using an Oracle Database session pool or providing explicit request boundaries. Access Control (AC) is enabled on the database service that the application uses to connect to the database. AC is supported up until the first transaction commits, within any one request.

This section includes the following topics:

11.2.3.1.1 How is Application Continuity Enabled?

AC relies on Oracle RAC or Data Guard setups, which ensure that a standby database is available and fully synchronized with the primary workload. The connection end-points must be configured to support re-routing.

AC is enabled by setting specific parameters in the client’s connection descriptor or driver configuration. Additionally, parameters related to number and delay between retries can be configured so that the client knows how many times to attempt reconnection before giving up. Connection pools detect lost connections, automatically establish new ones and restore session state as needed.

11.2.3.1.2 When Is Application Continuity Most Effective?

Application Continuity in OCI is most effective when an application is able to mark the beginning and end of an application request, either explicitly (calling OCIRequestBegin() and OCIRequestEnd()) or implicitly through connection acquisition and release using an OCI session pool.

11.2.3.1.3 What Factors Disable Application Continuity in OCI

Lists the factors that implicitly disables Application Continuity in OCI until the start of the next application request.

The following situations implicitly disable Application Continuity in OCI until the start of the next application request:

The database tier (server) detects a condition that is not consistent with replay. For example, if a PL/SQL anonymous block has an embedded top level COMMIT statement (autonomous transactions are not considered top level), the driver implicitly disables Application Continuity in OCI.
The application calls an OCI function that is not supported by Application Continuity in OCI. One of these functions is OCIStmtPrepare(). Use the OCIStmtPrepare2() call to support the use of Application Continuity in an HA infrastructure.
Streaming binds or defines of descriptor-based types such as objects or LOB locators.

The application can explicitly disable Application Continuity in OCI by calling OCIRequestDisableReplay().

11.2.3.1.4 Possible Side Effects of Application Continuity

Application Continuity in OCI replays the original PL/SQL and SQL statements following a recoverable error once a session is re-established and the database state is restored. If the original execution had side effects (actions outside of the session and transaction such as sending an email, printing, and so on) and that operation is replayed, those side effects will likely be repeated. Applications need to be coded to handle these side-effects and decide whether duplicate execution is acceptable. If it is not acceptable, then the application need to take action to accommodate or mitigate the effects of replay. For example, by calling OCIRequestDisableReplay().

See Also:

Potential Side Effects of Application Continuity for more information about examples of actions that create side effects

11.2.3.1.5 Supported OCI Functions for Application Continuity

This section describes the functions that enable Application Continuity in OCI to perform failover during an outage.

Application Continuity in OCI can fail over if an outage occurs during one of the following functions:

OCILobAppend()
OCILobArrayRead()
OCILobArrayWrite()
OCILobAssign()
OCILobCharSetForm()
OCILobClose()
OCILobCopy2()
OCILobCreateTemporary()
OCILobFileClose()
OCILobGetStorageLimit()
OCILobIsOpen()
OCILobLoadFromFile()
OCILobLocatorAssign()
OCILobOpen()
OCILobRead2()
OCILobTrim2()
OCILobWriteAppend2()
OCILobWrite2()
OCIStmtExecute()
OCISessionEnd()
OCITransRollback()

OCILobFileCloseAll()
OCILobFileGetName()
OCILobFileIsOpen()
OCILobFileOpen()
OCILobFileSetName()
OCILobFreeTemporary()
OCILobGetChunkSize()
OCILobGetLength()
OCILobGetLength2()
OCILobIsEqual()
OCILobIsTemporary()
OCILobLoadFromFile2()
OCILobLocatorIsInit()
OCILobRead()
OCILobTrim()
OCILobWriteAppend()
OCILobWrite()
OCIPing()
OCIStmtFetch()
OCIStmtFetch2()
OCITransCommit()

11.2.3.2 Support for Transparent Application Continuity (TAC)

Transparent Application Continuity (TAC) is a functional mode of Application Continuity that transparently tracks and records session and transactional state so that a database session can be recovered following recoverable outages.

This is done automatically without manual intervention, so there is no need for a DBA to have any knowledge of the application or for a developer to make any application code changes. Transparency is achieved by using a state-tracking infrastructure that categorizes session state usage as an application issues user calls. You can enable Transparent Application Continuity as default to protect applications during planned maintenance and when unplanned outages occur. With Transparent Application Continuity, application does not need to be changed as the following occur automatically:

Restore PRESET states
Recognize and disable application level side effects when recovering a session.

The database tier and OCI application tier track transaction and session state usage. This enables OCI to detect and inject possible request boundaries.

Transparent Application Continuity (TAC) is enabled using the configuration parameter FAILOVER_TYPE=AUTO. Set this configuration parameter either on the client side using tnsnames.ora file or on the database server side for Oracle RAC using SRVCTL or DBMS_SERVICE package.

See Also:

Transparent Application Continuity in Oracle Real Application Clusters Administration and Deployment Guide, for more information about Transparent Application Continuity

11.2.3.3 Transparent Application Failover (TAF) in OCI

Transparent application failover (TAF) is a client-side feature designed to minimize disruptions to end-user applications that occur when database connectivity fails because of instance or network failure.

TAF automatically reroutes application connections to a surviving database instance.

TAF can be configured to restore database sessions and optionally, to replay the open (in process of retrieving rows) queries. Replaying open queries helps read-only applications continue operating without manual intervention. Applications using DML operations should consider using AC or TAC instead.

TAF can be implemented on a variety of system configurations, including Oracle Real Application Clusters (Oracle RAC) and Oracle Data Guard physical standby databases. TAF can also be used after restarting a single instance system (for example, when repairs are made).

All statements that an application attempts to execute after a failure use TAF failover feature. This means, after a failure, any statement your application tries to execute automatically engages TAF recovery. This includes not just the one that failed at the time of the failure, but also any other statements your application tries to execute after the failure. Subsequent statements may succeed, or the application may receive errors corresponding to an attempted TAF recovery (such as ORA-25401). The application developer need to determine how to handle these errors, either by implementing error-handling logic within the application or by notifying the user through an alert mechanism. The application logic should be built to address possible errors by re-executing the query from the beginning.

Note:

Oracle recommends for applications to register a callback, so when failover happens, the callback can be used to restore the session to the desired state.

Note:

TAF is not supported for :

Remote database links
A session with a transaction already in progress
LOB columns which are part of the select list

11.2.3.3.1 Configuring Transparent Application Failover

Transparent Application Failover (TAF) can be configured on both database tier and the OCI application tier. If both are configured, database tier settings take precedence.

Configure TAF on the client side by setting the FAILOVER_MODE parameter to SESSION or SELECT in the CONNECT_DATA portion of a connect descriptor (for example, in tnsnames.ora file).

Configure TAF on the server side by setting FAILOVER_MODE to SESSION or SELECT. Use srvctl modify service for services managed by Oracle Clusterware, Oracle Restart, or Oracle Global Data Services. Otherwise, use the DBMS_SERVICE.MODIFY_SERVICE packaged procedure for services that are not so managed, such as those defined in a single instance database.

Whether configured on the client or the service, TAF has two possible modes:

SESSION - Only the connection and session are re-established. Any existing cursors need to be re-executed.
SELECT - The session is restored. In addition, if the application tries to fetch from previously active cursors, then OCI attempts to re-execute and restore the cursor to its original row set location. This process occurs based on the session states active at the time of the fetch call. The fetch fails if the new row set (or its effective order) differs from the original execution.

An initial attempt at failover may not always succeed. OCI provides a mechanism for retrying failover after an unsuccessful attempt.

See Also:

Oracle Database Net Services Reference for more information about client-side configuration of TAF (Connect Data Section)
Oracle Database PL/SQL Packages and Types Reference for more information about the database tier configuration of TAF (DBMS_SERVICE)
High Availability Overview and Best Practices - Configuring Level 3: Mask Unplanned and Planned Failovers from Applications for more information on unplanned outages and planned failovers

11.3 Failover Support for Token-Based Authentication in OCI

This section covers failover support for token-based authentication in OCI.

See Also:

Identity and Access Management (IAM) Token-Based Authentication

When using token-based authentication, there is a possibility that the original token may expire by the time a failover occurs. Therefore, the application needs a way to provide an updated token during the failover process. For example, if the token is stored in an operating system file, it must be updated with the new expiration details in the event of a failover. This can be accomplished through the failover callback capability, which is explained in section Failover Callback. Providing the database token The updated token can be provided using any one of the following two ways:

11.3.1 Providing the Database Token Programmatically

If the cached token expires, the latest token and key must be specified in the failover callback by calling OCIAttrSet() with attributes OCI_ATTR_TOKEN and OCI_ATTR_IAM_PRIVKEY on the session handle. These are cached by OCI after the failover is complete.

A fail over event OCI_FO_BEGIN_EXPIREDTOKEN is invoked to renew the expired token.

Example 11-1 TAF Callback Function to Accommodate Token-Based Authentication

sb4 callback_fn(dvoid *svchp, dvoid *envhp, dvoid *fo_ctx, ub4 fo_type, ub4 fo_\
event)
{
  if (fo_event == OCI_FO_BEGIN_EXPIREDTOKEN)
  {
    /* Set new token attributes */
    OCISession *usrhp=NULL;
    OCIError *ehp=NULL;
    OCIHandleAlloc ((dvoid *)envhp, (dvoid **)&ehp,
                    OCI_HTYPE_ERROR, 0, (dvoid *)0);
    OCIAttrGet((dvoid *)svchp, (ub4)OCI_HTYPE_SVCCTX,
               (dvoid *)&usrhp, (ub4 *)0, (ub4)OCI_ATTR_SESSION, ehp);
    if (usrhp)
    {
      getToken(token, &tokenLen, privateKey, &privateKeyLen);
      OCIAttrSet((dvoid *) usrhp, (ub4) OCI_HTYPE_SESSION,
                 (dvoid *) token, (ub4) tokenLen,
                 OCI_ATTR_TOKEN, ehp);
      OCIAttrSet((dvoid *) usrhp, (ub4) OCI_HTYPE_SESSION,
                 (dvoid *) privateKey, (ub4) privateKeyLen,
                 OCI_ATTR_IAM_PRIVKEY, ehp);
    }
    OCIHandleFree((dvoid *) ehp, OCI_HTYPE_ERROR);
  }
}

Example 11-2 Get Token

void getToken(char *token[], sb4 *tokenLen, char *privateKey[], char *tokenLoc)
{
  token_file_loc = 'tokenFile.txt';
  private_key_file_loc = 'privateKey.pem';
  fp = fopen(token_file_loc, "r");
  if (fp != NULL) {
    size_t newLen = fread(token, sizeof(char), MAXBUFLEN, fp);
    token[newLen++] = '\0';
  }
  *tokenLen = newLen;
  fclose(fp);
  fp = fopen(private_key_file_loc, "r");
  if (fp != NULL)
  {
    while ((rlen = getline(&line, &len, fp)) != -1 && line != NULL)
    {
      /* skip lines containing PEM delimiters */
      if ( strstr( line, "--BEGIN ") != NULL) {
        start = TRUE;
        continue;
      }
      else if ( strstr( line, "--END ") != NULL) {
        start = FALSE;
      break;
      }
      if (!start)
        continue;
      /* remove the \n */
      line[strlen((const char *)line) - 1] = '\0';
      strcat((char *)privateKey, line);
      pvreadLen += strlen((const char *)line);
  }
  *privateKeyLen = pvreadLen;
  return;
}

11.3.2 Providing the Database Token in a File

The location of the token file must be specified in the connect string when establishing the initial session and the post-failover session.

Note:

The token is always read from the file and not cached

The application administrator must keep the token location up-to-date with the latest token.

11.4 Advanced Programmatic Considerations

This chapter describes High Availability configurations for handling of more complex states. Most applications can ensure planned and unplanned maintenance by following the steps detailed in the following chapters:

For more fine-grained insight into handling of more complex states then you might consider the following sections:

11.4.1 High Availability Event Notification

Use HA event notification to provide a best-effort programmatic signal to the client if there is a database failure for high availability clients connected to an Oracle RAC database.

Suppose that a user employs a web browser to log in to an application server that accesses a back-end database server. Failure of the database instance can result in a wait that can be up to minutes in duration before the failure is known to the user. The ability to quickly detect failures of database tier instances, communicate this to the client, close connections, and clean up idle connections in session pools is provided by HA event notification.

For high availability clients connected to an Oracle RAC database, you can use HA event notification to provide a best-effort programmatic signal to the client if there is a database failure. Client applications can register a callback on the environment handle to signal interest in this information. When a significant failure event occurs that applies to a connection made by this client, the callback is invoked, with information concerning the event (the event payload) and a list of connections (server handles) that were disconnected because of the failure.

For example, consider a client application that has two connections to instance A and two connections to instance B of the same database. If instance A goes down, a notification of the event is sent to the client, which then disconnects the two connections to instance B and invokes the registered callback. Note that if another instance C of the same database goes down, the client is not notified (because it does not affect any of the client's connections).

The HA event notification mechanism improves the response time of the application in the presence of failure. Before the mechanism was introduced in Oracle Database 10g Release 2 (10.2), a failure would result in the connection being broken only after the TCP timeout interval expired, which could take minutes. With HA event notification, the standalone and session pool connections are automatically broken and cleaned up by OCI, and the application callback is invoked within seconds of the failure event. If any of these server handles are TAF-enabled, failover is also automatically engaged by OCI.

In the current release, this functionality depends on Oracle Notification Service (ONS). It requires Oracle Clusterware to be installed and configured on the database server for the clients to receive the HA notifications through ONS. All clusterware installations (for example, Oracle Data Guard) should have the same ONS port. There is no client configuration required for ONS.

Note:

The client transparently gets the ONS server information from the database to which it connects. The application administrator can augment or override that information using the deployment configuration file oraaccess.xml.

Applications must connect to an Oracle RAC instance to enable HA event notification. Furthermore, these applications must:

Initialize the OCI Environment in OCI_EVENTS mode
Connect to a service that has notifications enabled (use the DBMS_SERVICE.MODIFY_SERVICE procedure to set AQ_HA_NOTIFICATIONS to TRUE)
Link with a thread library

Then these applications can register a callback that is invoked whenever an HA event occurs.

This section includes the following topics:

See Also:

About Client-Side Deployment Parameters Specified in oraaccess.xml for more information about oraaccess.xml and details about the parameters under <events>, <fan> and <ons>

11.4.1.1 OCIEvent Handle

The OCIEvent handle encapsulates the attributes from the event payload.

OCI implicitly allocates this handle before calling the event callback, which can obtain the read-only attributes of the event by calling OCIAttrGet(). Memory associated with these attributes is only valid for the duration of the event callback.

See Also:

11.4.1.2 OCI Failover for Connection and Session Pools

An OCI session pool maintains a collection of connections to an Oracle Database. If Oracle RAC is deployed, a session pool may contain connections to different instances of the Oracle RAC cluster.

Upon receiving the database instance failure notification, all the connections connected to that particular instance should be cleaned up. For the connections that are in use, OCI must close the connections: transparent application failover (TAF) occurs immediately, and those connections are reestablished. The connections that are idle and in the free list of the pool must be purged, so that a bad connection is never returned to the user from the pool.

To accommodate custom connection pools, OCI provides a callback function that can be registered on the environment handle. If registered, this callback is invoked when an HA event occurs. Note that server handles from OCI session pools are not passed to the callback. Hence in some cases, the callback could be called with an empty list of connections.

11.4.1.3 OCI Failover for Independent Connections

No special handling is required for independent connections; all such connections that are connected to failed instances are immediately disconnected.

For idle connections, TAF is engaged to reestablish the connection when the connection is used on a subsequent OCI call. Connections that are in use at the time of the failure event are broken out immediately, so that TAF can begin. Note that this applies for the "in-use" connections of connection and session pools also.

11.4.1.4 Event Callback

Shows the signature of the event callback of type OCIEventCallback.

The event callback, of type OCIEventCallback, has the following signature:

void evtcallback_fn (void      *evtctx,
                     OCIEvent  *eventhp );

In this signature evtctx is the client context, and OCIEvent is an event handle that is opaque to the OCI library. The other input argument is eventhp, the event handle (the attributes associated with an event).

If registered, this function is called once for each event. For Oracle RAC HA events, this callback is invoked after the affected connections have been disconnected. The following environment handle attributes are used to register an event callback and context, respectively:

OCI_ATTR_EVTCBK is of data type OCIEventCallback *. It is read-only.
OCI_ATTR_EVTCTX is of data type void *. It is also read-only.

text *myctx = "dummy context"; /* dummy context passed to callback fn */
...
/* OCI_ATTR_EVTCBK and OCI_ATTR_EVTCTX are read-only. */
OCIAttrSet(envhp, (ub4) OCI_HTYPE_ENV, (void *) evtcallback_fn,
           (ub4) 0, (ub4) OCI_ATTR_EVTCBK, errhp);
OCIAttrSet(envhp, (ub4) OCI_HTYPE_ENV, (void *) myctx,
           (ub4) 0, (ub4) OCI_ATTR_EVTCTX, errhp);
...

Within the OCI event callback, the list of affected database server handles is encapsulated in the OCIEvent handle. For Oracle RAC HA DOWN events, client applications can iterate over a list of database server handles that are affected by the event by using OCIAttrGet() with attribute types OCI_ATTR_HA_SRVFIRST and OCI_ATTR_HA_SRVNEXT:

OCIAttrGet(eventhp, OCI_HTYPE_EVENT, (void *)&srvhp, (ub4 *)0,
           OCI_ATTR_HA_SRVFIRST, errhp); 
/* or, */
OCIAttrGet(eventhp, OCI_HTYPE_EVENT, (void *)&srvhp, (ub4 *)0,
           OCI_ATTR_HA_SRVNEXT, errhp);

When called with attribute OCI_ATTR_HA_SRVFIRST, this function retrieves the first database server handle in the list of server handles affected. When called with attribute OCI_ATTR_HA_SRVNEXT, this function retrieves the next server handle in the list. This function returns OCI_NO_DATA and srvhp is a NULL pointer, when there are no more server handles to return.

srvhp is an output pointer to a database server handle whose connection has been closed because of an HA event. errhp is an error handle to populate. The application returns an OCI_NO_DATA error when there are no more affected server handles to retrieve.

When retrieving the list of server handles that have been affected by an HA event, be aware that the connection has already been closed and many server handle attributes are no longer valid. Instead, use the user memory segment of the server handle to store any per-connection attributes required by the event notification callback. This memory remains valid until the server handle is freed.

See Also:

OCIAttrGet()

11.4.1.5 Custom Pooling: Tagged Database Server Handles

Using custom pools, you can retrieve the database server handle’s tag information so appropriate cleanup can be performed.

The following features apply to custom pools:

You can tag a database server handle with its parent connection object if it is created on behalf of a custom pool. Use the "user memory" parameters of OCIHandleAlloc() to request that the database server handle be allocated with a user memory segment. A pointer to the "user memory" segment is returned by OCIHandleAlloc().
When an HA event occurs and an affected server handle has been retrieved, there is a means to retrieve the database server handle's tag information so appropriate cleanup can be performed. The attribute OCI_ATTR_USER_MEMORY is used to retrieve a pointer to a handle's user memory segment. OCI_ATTR_USER_MEMORY is valid for all user-allocated handles. If the handle was allocated with extra memory, this attribute returns a pointer to the user memory. A NULL pointer is returned for those handles not allocated with extra memory. This attribute is read-only and is of data type void*.

Note:

You are free to define the precise contents of the database server handle's user memory segment to facilitate cleanup activities from within the HA event callback (or for other purposes if needed) because OCI does not write or read from this memory in any way. The user memory segment is freed with the OCIHandleFree() call on the server handle.

Example 11-3 shows an example of event notification.

Example 11-3 Event Notification

sword retval;
OCIServer *srvhp;
struct myctx {
   void *parentConn_myctx;
   uword numval_myctx;
};
typedef struct myctx myctx; 
myctx  *myctxp;
/* Allocate a server handle with user memory - pre 10.2 functionality */
if (retval = OCIHandleAlloc(envhp, (void **)&srvhp, OCI_HTYPE_SERVER,
                            (size_t)sizeof(myctx), (void **)&myctxp)
/* handle error */
myctxp->parentConn_myctx = <parent connection reference>;
 
/* In an event callback function, retrieve the pointer to the user memory */
evtcallback_fn(void *evtctx, OCIEvent *eventhp)
{ 
  myctx *ctxp = (myctx *)evtctx;
  OCIServer *srvhp;
  OCIError *errhp;
  sb4       retcode;
  retcode = OCIAttrGet(eventhp, OCI_HTYPE_SERVER, &srvhp, (ub4 *)0,
                       OCI_ATTR_HA_SRVFIRST, errhp); 
  while (!retcode) /* OCIAttrGet will return OCI_NO_DATA if no more srvhp */ 
  {  
     OCIAttrGet((void *)srvhp, OCI_HTYPE_SERVER, (void *)&ctxp,
                (ub4)0, (ub4)OCI_ATTR_USER_MEMORY, errhp);
           /* Remove the server handle from the parent connection object */
     retcode = OCIAttrGet(eventhp, OCI_HTYPE_SERVER, &srvhp, (ub4 *)0,
                          OCI_ATTR_HA_SRVNEXT, errhp);
...
  }
...
}

See Also:

11.4.1.6 About Determining Transparent Application Failover (TAF) Capabilities

You can have the application adjust its behavior if a connection is or is not TAF or AC-enabled.

Use OCIAttrGet() as follows to determine if a server handle is TAF or AC-enabled:

boolean taf_capable;
...
OCIAttrGet(srvhp, (ub4) OCI_HTYPE_SERVER, (void *) &taf_capable, 
           (ub4) sizeof(taf_capable), (ub4)OCI_ATTR_TAF_ENABLED, errhp);
...

In this example, taf_capable is a Boolean variable, which this call sets to TRUE if the database server handle is TAF-enabled, and FALSE if not; srvhp is an input target database server handle; OCI_ATTR_TAF_ENABLED is an attribute that is a pointer to a Boolean variable and is read-only; errhp is an input error handle.

11.4.2 Failover Callback

A Failover Callback refers to a user-defined function.

It is executed automatically during a failover event, allowing the application to respond appropriately to changes in the database connection. Failover callbacks are part of the Transparent Application Failover (TAF) and AC/TAC mechanism, enabling applications to recover gracefully when a database instance becomes unavailable.

This chapter contains the following topics:

11.4.2.1 Failover Callbacks in OCI

Because of the delay that can occur during failover, the application developer may want to inform the user that failover is in progress, and request that the user wait for notification that failover is complete.

Additionally, the session on the initial instance may have received some ALTER SESSION commands. These ALTER SESSION commands are not automatically replayed on the second instance. Consequently, the developer may want to replay them on the second instance. OCIAttrSet() calls that affect the session must also be reexecuted.

To accommodate these requirements, the application developer can register a failover callback function. If failover occurs, the callback function is invoked several times while reestablishing the user's session.

The first call to the callback function occurs when the database first detects an instance connection loss. This callback is intended to allow the application to inform the user of an upcoming delay. If failover is successful, a second call to the callback function occurs when the connection is reestablished and usable.

Once the connection has been reestablished, the client may want to replay ALTER SESSION commands and inform the user that failover has happened. If failover is unsuccessful, then the callback is called to inform the application that failover cannot occur. Additionally, the callback is called each time a user handle besides the primary handle is reauthenticated on the new connection. Because each user handle represents a database tier session, the OCI application tier may want to replay ALTER SESSION commands for that session.

See Also:

OCIAttrSet()
Handling OCI_FO_ERROR for more information about this scenario

11.4.2.2 Failover Callback Structure and Parameters

Shows and describes the callback structure and parameters.

The basic structure of a callback function is as follows:

sb4  failovercbk_fn(OCISvcCtx *svchp, 
                    OCIEnv    *envhp, 
                    void      *fo_ctx, 
                    ub4        fo_type, 
                    ub4        fo_event);

An example is provided in "Failover Callback Example" on page 9‐31 for the following parameters:

svchp: svchp is the service context handle. It is of type void *.
envhp: envhp is the OCI environment handle. It is of type void *.
fo_ctx: fo_ctx is a client context. In this area the client can keep any necessary state or context. It is passed as a void *.
fo_type: fo_type is the failover type. This lets the callback know what type of failover the client has requested. The usual values are as follows:

OCI_FO_SESSION that the user has configured TAF session failover.
OCI_FO_SELECT indicates that the user has configured TAF select failover.
OCI_FO_TRANSACTION indicates that the user has configured AC.
OCI_FO_AUTO indicates that the user has configured TAC.

fo_event: fo_event is the failover event. This indicates to the callback why it is being called. It has several possible values:

OCI_FO_BEGIN indicates that failover has detected a lost connection and failover is starting.
OCI_FO_END indicates successful completion of failover.
OCI_FO_ABORT indicates that failover was unsuccessful, and there is no option of retrying.
OCI_FO_ERROR also indicates that failover was unsuccessful, but it gives the application the opportunity to handle the error and retry failover.
OCI_FO_REAUTH indicates that you have multiple authentication handles and failover has occurred after the original authentication. It indicates that a user handle has been reauthenticated. To determine which one, the application checks the OCI_ATTR_SESSION attribute of the service context handle svchp.

If Application Continuity is configured, the TAF callback is called with OCI_FO_END after successfully re-connecting, re-authenticating, and determining the status of the inflight transaction. Upon completion of the TAF callback, OCI returns an error if an open transaction is present and Application Continuity for OCI is enabled.

11.4.2.3 Failover Callback Registration

For the failover callback to be used, it must be registered on the server context handle. This registration is done by creating a callback definition structure and setting the OCI_ATTR_FOCBK attribute of the server handle to this structure.

The callback definition structure must be of type OCIFocbkStruct. It has two fields: callback_function, which contains the address of the function to call, and fo_ctx, which contains the address of the client context.

See Also:

Example 11-5 for an example of callback registration

11.4.2.4 Failover Callback Example

Shows several failover callback examples.

This section shows an example of a simple user-defined callback function definition (see Example 11-4), failover callback registration (see Example 11-5), and failover callback unregistration (see Example 11-6).

Example 11-4 User-Defined Failover Callback Function Definition

sb4  callback_fn(svchp, envhp, fo_ctx, fo_type, fo_event)
void * svchp;
void * envhp;
void *fo_ctx;
ub4 fo_type;
ub4 fo_event;
{
switch (fo_event) 
   {
   case OCI_FO_BEGIN:
   {
     printf(" Failing Over ... Please stand by \n");
     printf(" Failover type was found to be %s \n",
                     ((fo_type==OCI_FO_NONE) ? "NONE"
                     :(fo_type==OCI_FO_SESSION) ? "SESSION"
                     :(fo_type==OCI_FO_SELECT) ? "SELECT"
                     :(fo_type==OCI_FO_TXNAL) ? "TRANSACTION"
                     :(fo_type==OCI_FO_AUTO) ? “AUTO”
                     : "UNKNOWN!")); 
     printf(" Failover Context is :%s\n", 
                    (fo_ctx?(char *)fo_ctx:"NULL POINTER!"));
     break;
   }
   case OCI_FO_ABORT:
   {
     printf(" Failover stopped. Failover will not occur.\n");
     break;
   }
   case    OCI_FO_END:
   {
       printf(" Failover ended ...resuming services\n");
     break;
   }
   case OCI_FO_REAUTH:
   {
       printf(" Failed over user. Resuming services\n");
     break;
   }
   default:
   {
     printf("Bad Failover Event: %d.\n",  fo_event);
     break;
   }
   }
   return 0;
}

Example 11-5 Failover Callback Registration

int register_callback(srvh, errh)
void *srvh; /* the server handle */
OCIError *errh; /* the error handle */
{
  OCIFocbkStruct failover;                 /*  failover callback structure */
  /* allocate memory for context */
  if (!(failover.fo_ctx = (void *)malloc(strlen("my context.")+1)))
     return(1);
  /* initialize the context. */
  strcpy((char *)failover.fo_ctx, "my context.");
  failover.callback_function = &callback_fn;
  /* do the registration */
  if (OCIAttrSet(srvh, (ub4) OCI_HTYPE_SERVER,
                (void *) &failover, (ub4) 0,
                (ub4) OCI_ATTR_FOCBK, errh)  != OCI_SUCCESS)
     return(2);
  /* successful conclusion */
  return (0);
}

Example 11-6 Failover Callback Unregistration

OCIFocbkStruct failover;   /*  failover callback structure */
sword status;
 
  /* set the failover context to null */
  failover.fo_ctx = NULL; 
  /* set the failover callback to null */ 
  failover.callback_function = NULL; 
  /* unregister the callback */
  status = OCIAttrSet(srvhp, (ub4) OCI_HTYPE_SERVER,
                      (void *) &failover, (ub4) 0,
                      (ub4) OCI_ATTR_FOCBK, errhp);

11.4.2.5 Handling OCI_FO_ERROR

A failover attempt is not always successful. If the attempt fails, the callback function receives a value of OCI_FO_ABORT or OCI_FO_ERROR in the fo_event parameter.

A value of OCI_FO_ABORT indicates that failover was unsuccessful, and no further failover attempts are possible. OCI_FO_ERROR, however, provides the callback function with the opportunity to handle the error. For example, the callback may choose to wait a specified period of time and then indicate to the OCI library that it must reattempt failover.

Consider the timeline of events presented in Table 11-1.

Table 11-1 Time and Event

Time	Event
T0	Database fails (failure lasts until T5).
T1	Failover is triggered by user activity.
T2	User attempts to reconnect; attempt fails.
T3	Failover callback is invoked with `OCI_FO_ERROR`.
T4	Failover callback enters a predetermined sleep period.
T5	Database comes back up again.
T6	Failover callback triggers a new failover attempt; it is successful.
T7	User successfully reconnects.

The callback function triggers the new failover attempt by returning a value of OCI_FO_RETRY from the function.

Example 11-7 shows a callback function that you can use to implement the failover strategy similar to the scenario described earlier. In this case, the failover callback enters a loop in which it sleeps and then reattempts failover until it is successful:

Example 11-7 Callback Function That Implements a Failover Strategy

/*--------------------------------------------------------------------*/
/* the user-defined failover callback  */
/*--------------------------------------------------------------------*/
sb4  callback_fn(svchp, envhp, fo_ctx, fo_type, fo_event )
void * svchp;
void * envhp;
void *fo_ctx;
ub4 fo_type;
ub4 fo_event;
{
   OCIError *errhp;
   OCIHandleAlloc(envhp, (void **)&errhp, (ub4) OCI_HTYPE_ERROR,
              (size_t) 0, (void **) 0);
   switch (fo_event) 
   {
   case OCI_FO_BEGIN:
   {
     printf(" Failing Over ... Please stand by \n");
     printf(" Failover type was found to be %s \n",
            ((fo_type==OCI_FO_NONE) ? "NONE"
             :(fo_type==OCI_FO_SESSION) ? "SESSION" 
             :(fo_type==OCI_FO_SELECT) ? "SELECT"
             :(fo_type==OCI_FO_TXNAL) ? "TRANSACTION"
             : "UNKNOWN!")); 
     printf(" Failover Context is :%s\n", 
            (fo_ctx?(char *)fo_ctx:"NULL POINTER!"));
     break;
   }
   case OCI_FO_ABORT:
   {
     printf(" Failover aborted. Failover will not occur.\n");
     break;
   }
   case    OCI_FO_END:
   { 
       printf("\n Failover ended ...resuming services\n");
     break;
   }
   case OCI_FO_REAUTH:
   { 
       printf(" Failed over user. Resuming services\n");
     break;
   }
   case OCI_FO_ERROR:
   {
     /* all invocations of this can only generate one line. The newline
      * will be put at fo_end time.
      */
     printf(" Failover error gotten. Sleeping...");
     sleep(3);
     printf("Retrying. ");
     return (OCI_FO_RETRY);
     break;
   }
   default:
   {
     printf("Bad Failover Event: %d.\n",  fo_event);
     break;
   }
   }
   return 0;
}