In the event of a node failure in an OPS environment, Oracle SQL*Net clients may be configured to reconnect to the surviving server without the use of IP failover.
In an OPS environment, multiple Oracle instances co-operate to provide access to the same shared database. The Oracle clients may access the database using any of the instances. Thus if one or more instance have failed, clients may continue to access the database by connecting to a surviving instance.
There are many ways to accomplish the task of reconnecting to a surviving instance transparently to the end user:
Design the application such that if the Oracle client loses the connection to the Oracle instance, it reestablishes the connection to an alternate instance. This implies the client is aware that it is operating in a multi-instance (OPS) environment.
However, such a solution is seldom used. Instead, most implementations use middleware such as the Tuxedo transaction monitor (TM), to implement the reconnection logic. The Oracle client connects to the TM, which in turn connects to one of the many database instances. The TM hides the failure of a particular database instance from the clients by reconnecting to alternate instances. The advantage of the TM approach is existing Oracle client applications need not be rewritten to take advantage of the multiple instances in an OPS environment. The disadvantage is the cost of integrating with a TM.
Design the application (Oracle client) such that when it loses the connection to the database instance, it retries the connection to the same server. Thus Oracle client applications designed for a non-parallel environment can be moved into an OPS environment without redesign. The "infrastructure" is then designed to ensure the connection is routed to the surviving server.
One solution used to accomplish this is to use the IP failover features of Sun Cluster in conjunction with the OPS data services. The rest of this document describes a simpler alternative through the use of high availability features of Oracle SQL*Net. IP failover is not required to implement this functionality.
From the Oracle client perspective the model is simple, when the server crashes the client sees a broken connection. The client reconnects to the server, and resubmits the transaction. Oracle SQL*Net provides features and capabilities to incorporate multiple instances running on different hosts under the same service. Hence, when the client reconnects it is automatically connected through to the surviving instance. The reconnection is not automatic. The client typically incorporates the code to reconnect broken connections (to the same service as before).
With a node or instance failure, the surviving instance(s) must first recover the failed instances state. During this recovery time clients will see a lack of response from the instance. This recovery has nothing do with the Sun Cluster framework. Recovery is totally dependent on Oracle, the transaction volume, and recovery mechanism for OPS.
Two ways to configure Oracle SQL*Net on the client (the TNSNAMES.ORA file) are shown below. The client reconnection time to the surviving instance is not influenced by the method used to configure Oracle SQL*Net.
Configure the same "connect string" for multiple instances to run on different hosts with the same ORACLE SID.
ora = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = erlan) (PORT = 1526) <- instance 1 ) (CONNECT_DATA= (SID=ora)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = weibull) (PORT = 1526) <- instance 2 ) (CONNECT_DATA= (SID=ora)) ) ) |
Configure the same "connect string" for instances to run on different hosts with different ORACLE SIDs.
ora = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = erlang) (PORT = 1526)) (CONNECT_DATA = (SID = ora)(GLOBAL_NAME = ora)) ) (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = weibull) (PORT = 1526)) (CONNECT_DATA = (SID = ora1)(GLOBAL_NAME = ora)) ) ) |
This configuration has listeners running for each of the instances.