Sun Cluster 2.2 Software Installation Guide

14.1 General Information for Parallel Database Systems

14.1.1 Shared Disk Architecture

The shared disk configuration of Sun Cluster is used by OPS. In this configuration, a single database is shared among multiple instances of OPS, which access the database concurrently. Conflicting access to the same data is controlled by means of a distributed lock manager (the Oracle UNIX Distributed Lock Manager (DLM)). If a process or a node crashes, the DLM is reconfigured to recover from such failure.

14.1.2 Shared Nothing Architecture

The shared nothing disk configuration of Sun Cluster is used by Informix-Online XPS. The database server instance(s) on each node has sole access to its own database partition.

A database query from a client is analyzed by the servers for its table partitions and forwarded across the private network to the appropriate servers. The results are merged across the private network and returned to the client.

14.1.3 SMA Shared Memory Issues

Some applications (OPS, for example) sometimes require modification of the /etc/system file so that the minimum amount of shared memory that may be requested is unusually high. For example, if the field shmsys:shminfo_shmmin in the /etc/system file is set to a value greater than 200 bytes, the sm_config(1M) command will not be able to acquire shared memory, as it ends up requesting a smaller number of bytes than the minimum the system can allocate. As a result, the shmget(2) system call made by the sm_config(1M) command fails, thus aborting sm_config(1M).

To work around this problem, edit the /etc/system file and set the value of shmsys:shminfo_shmmin to 200. The value of shmsys:shminfo_shmmax should be greater than 200. Then reboot the machine for the new values to take effect.

If you encounter semsys warnings and core dumps, it could mean that the semaphore values contained in the semsys:seminfo_* fields in the /etc/system file do not match the actual physical limits of the machine.

14.1.4 OPS and IP Failover

In the event of a node failure in an OPS environment, Oracle SQL*Net clients may be configured to reconnect to the surviving server without the use of IP failover.

In an OPS environment, multiple Oracle instances co-operate to provide access to the same shared database. The Oracle clients may access the database using any of the instances. Thus if one or more instance have failed, clients may continue to access the database by connecting to a surviving instance.

There are many ways to accomplish the task of reconnecting to a surviving instance transparently to the end user:

14.1.4.1 High Availability Features of Oracle SQL*Net

From the Oracle client perspective the model is simple, when the server crashes the client sees a broken connection. The client reconnects to the server, and resubmits the transaction. Oracle SQL*Net provides features and capabilities to incorporate multiple instances running on different hosts under the same service. Hence, when the client reconnects it is automatically connected through to the surviving instance. The reconnection is not automatic. The client typically incorporates the code to reconnect broken connections (to the same service as before).


Note -

With a node or instance failure, the surviving instance(s) must first recover the failed instances state. During this recovery time clients will see a lack of response from the instance. This recovery has nothing do with the Sun Cluster framework. Recovery is totally dependent on Oracle, the transaction volume, and recovery mechanism for OPS.


14.1.4.2 Configuring Oracle SQL*Net

Two ways to configure Oracle SQL*Net on the client (the TNSNAMES.ORA file) are shown below. The client reconnection time to the surviving instance is not influenced by the method used to configure Oracle SQL*Net.

ora =
 (DESCRIPTION =
 		(ADDRESS_LIST =
 			(ADDRESS =
 					(PROTOCOL = TCP)
 					(HOST = erlan)
 					(PORT = 1526) <- instance 1
 			)
 			(CONNECT_DATA= (SID=ora))
 		)
 	(ADDRESS_LIST =
 			(ADDRESS =
 				(PROTOCOL = TCP)
 					(HOST = weibull)
 					(PORT = 1526) <- instance 2
 			)
 			(CONNECT_DATA= (SID=ora))
 	)
 )
ora =
 (DESCRIPTION_LIST =
 	(DESCRIPTION =
    			(ADDRESS_LIST =
 				(ADDRESS =
 						(PROTOCOL = TCP)
 						(HOST = erlang)
 						(PORT = 1526))
 				(CONNECT_DATA = (SID = ora)(GLOBAL_NAME = ora))
 	)
 	(DESCRIPTION =
   			(ADDRESS_LIST =
         				(ADDRESS =
 						(PROTOCOL = TCP)
 						(HOST = weibull)
 						(PORT = 1526))
   				(CONNECT_DATA = (SID = ora1)(GLOBAL_NAME = ora))
 	)
 )

This configuration has listeners running for each of the instances.