When using I/O-intensive data services with a large number of disks configured in the cluster, the application may experience delays due to retries within the I/O subsystem during disk failures. An I/O subsystem may take several minutes to retry and recover from a disk failure. This delay can result in Sun Cluster failing over the application to another node, even though the disk may have eventually recovered on its own.
To avoid failover during these instances, consider increasing the default probe timeout of the data service. If you need more information or help with increasing data service timeouts, contact your local support engineer.
If you are running Solaris 9, include the following entries in the /etc/nsswitch.conf configuration files on each node that can be the primary for oracle_server or oracle_listener resource so that the data service starts and stops correctly during a network failure:
passwd: files groups: files publickey: files project: files
The Sun Cluster HA for Oracle data service uses the super user command, su(1M), to start and stop the database. The network service might become unavailable when a cluster node's public network fails. Adding the above entries ensures that the su command does not refer to the NIS/NIS+ name services.
Sun Cluster HA-Siebel agent does not monitor individual Siebel components. If the failure of a Siebel component is detected, only a warning message is logged in syslog.
To work around this, restart the Siebel server resource group in which components are offline using the command scswitch -R -h node -g resource_group.
The message “SAP xserver is not available” is printed during the start up of SAP xserver due to the fact that xserver is not considered to be available until it is fully up and running.
Ignore this message during the startup of the SAP xserver.
When the node running the Siebel gateway has a path beginning with /home, which depends on network resources such as NFS and NIS, and the public network fails, the Siebel gateway probe times out and causes the Siebel gateway resource to go offline. Without the public network, Siebel gateway probe hangs while trying to open a file on “/home”, causing the probe to timeout.
To prevent the Siebel gateway probe from timing out while trying to open a file on /home, ensure the following for all the nodes of the cluster which can be the Siebel gateway:
Include the following entries are set to files in the /etc/nsswitch.conf file:
passwd: files groups: files publickey: files project: files
Eliminate all NFS or NIS dependencies for any path starting with /home. You may either have a locally mounted/home path or rename the /home mount point to /export/home or another name which does not start with /home.
Comment out the line containing +auto_master in the /etc/auto_master file, and change any /home entries to auto_home.
Comment out the line containing +auto_home in the /etc/auto_home file.
If a hostname in an URI in monitor_uri_list is an unknown host, the agent logs a message stating that the connection attempt has timed out. Normally, a connection that times out will trigger a restart or failover of the application server. However, when the hostname is unknown, the connection will not initiate a restart or failover.
If the agent logs a message saying that a connection timed out but does not take any action, check to ensure that the hostnames in monitor_uri_list are correct.
If you are running Solaris 9, include one of the following entries for the publickey database in the /etc/nsswitch.conf configuration files on each node that can be the primary for liveCache resources so that the data service starts and stops correctly during a network failure:
publickey: publickey: files publickey: files [NOTFOUND=return] nis publickey: files [NOTFOUND=return] nisplus
The Sun Cluster HA for SAP liveCache data service uses the dbmcli command to start and stop the liveCache. The network service might become unavailable when a cluster node's public network fails. Adding one of the above entries, in addition to updates documented in Sun Cluster 3.1 Data Service for SAP liveCache ensures that the su command and the dbmcli command do not refer to the NIS/NIS+ name services.
Do not configure the xserver resource as a failover resource. The Sun Cluster HA for SAP liveCache data service does not failover properly when xserver is configured as a failover resource.
The localized message catalogs for the following agents are not included in Data Services 3.1 5/03:
Sun ONE Application Server
Sun ONE Message Queue
BEA WebLogic