The following known issues and bugs affect the operation of the Sun Cluster 3.1 8/05 release.
Problem Summary: scvxinstall creates incorrect /etc/vfstab entries when boot device is multipathed.
Workaround: Run scvxinstall and choose to encapsulate. When the following message appears, type Ctrl-C to abort the reboot:
This node will be re-booted in 20 seconds. Type Ctrl-C to abort. |
Edit the vfstab entry so /global/.devices uses the /dev/{r}dsk/cXtXdX name instead of the /dev/did/{r}dsk name. This revised entry enables VxVM to recognize it as the root disk. Rerun scvxinstall and choose to encapsulate. The vfstab file has the necessary updates. Allow the reboot to occur. The encapsulation proceeds as normal.
Run scvxinstall and choose to encapsulate.
The system displays the following message:
This node will be re-booted in 20 seconds. Type Ctrl-C to abort. |
Abort the reboot.
Ctrl-C |
Edit the /etc/vfstab entry so /global/.devices uses the /dev/{r}dsk/cXtXdX name instead of the /dev/did/{r}dsk name.
This revised entry enables VxVM to recognize it as the root disk.
Rerun scvxinstall and choose to encapsulate.
The /etc/vfstab file has the necessary updates. Allow the reboot to occur. The encapsulation proceeds as normal.
Problem Summary: The Sun Cluster HA for SAP liveCache data service uses the dbmcli command to start and stop liveCache. If you are running Solaris 9, the network service might become unavailable when a cluster node's public network fails.
Workaround: Include one of the following entries for the publickey database in the /etc/nsswitch.conf files on each node that can be the primary for liveCache resources:
publickey: publickey: files publickey: files [NOTFOUND=return] nis publickey: files [NOTFOUND=return] nisplus
Adding one of the above entries, in addition to updates documented in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS, ensures that the su command and the dbmcli command do not refer to the NIS/NIS+ name services. Bypassing the NIS/NIS+ name services ensures that the data service starts and stops correctly during a network failure.
Problem Summary: The requirement for the nsswitch.conf file in Preparing the Nodes and Disks in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS does not apply to the entry for the passwd database. If these requirements are met, the su command might hang on each node that can master the liveCache resource when the public network is down.
Workaround: On each node that can master the liveCache resource, ensure that the entry in the /etc/nsswitch.conf file for the passwd database is as follows:
passwd: files nis [TRYAGAIN=0]
Problem Summary: sccheck might hang if launched simultaneously from multiple nodes.
Workaround: Do not launch sccheck from any multi-console that passes commands to multiple nodes. sccheck runs can overlap, but should not be launched simultaneously.
Problem Summary: Currently, HADB data service does not use the JAVA_HOME environment variable. Therefore, HADB, when invoked from the HADB data service, takes Java binaries from /usr/bin/. The Java binaries in /usr/bin/ need to be linked to the appropriate version of Java 1.4 and above for HADB data service to work properly.
Workaround: If you do not object to changing the default version available, perform the following procedure. As an example, this workaround assumes that the /usr/j2se directory is where you have the latest version of Java (such as 1.4 and above).
If you have a directory called java/ in the /usr/ directory, move it to a temporary location.
From the /usr/ directory, link /usr/bin/java and all other Java-related binaries to the appropriate version of Java.
# ln -s j2se java |
If you do not want to change the default version available, assign the JAVA_HOME environment variable with the appropriate version of Java (J2SE 1.4 and above) in the /opt/SUNWappserver7/SUNWhadb/4/bin/hadbm script.
Problem Summary: When a node is added to the cluster that runs Sun Cluster Support for Oracle Real Application Clusters and uses the VxVM cluster feature, the cluster feature running on other nodes does not recognize the new node.
Workaround: A fix for this problem is expected to be made available by VERITAS in VxVM 3.5 MP4 and VxVM 4.0 MP2. The fix for VxVM 4.1 is currently available.
To correct the problem if a code fix is not yet available, restart the Oracle database and reboot the cluster nodes. This step synchronizes the Oracle UDLM and updates the VxVM cluster feature to recognize the added node.
Do not install and configure Sun Cluster Support for Oracle Real Application Clusters on the new node until after you perform this step.
From a cluster node other than the node that you just added, shut down the Oracle Real Application Clusters database.
Reboot the same node on which you shut down the Oracle database.
# scswitch -S -h thisnode # shutdown -g0 -y -i6 |
Wait until the node is fully rebooted back into the cluster before you proceed to the next step.
Restart the Oracle database.
Repeat Step 1 through Step 3 on each remaining node that runs Sun Cluster Support for Oracle Real Application Clusters.
If a single node is capable of handling the Oracle database workload, you can perform these steps on multiple nodes simultaneously.
If more than one node is required to support the database workload, perform these steps on one node at a time.
Problem Summary: Due to bug 4974875, whenever autorecovery is performed, the database reinitializes itself without any spares. The mentioned bug has been fixed and integrated into HA-DB release 4.3. For HA-DB 4.2 and below releases, follow one of the procedures below to change the roles of the HA-DB nodes.
Workaround: Complete one of the following procedures to change the roles of the HA-DB nodes:
Identify the HA-DB nodes that have their roles changed after autorecovery is successful.
On all the nodes that you identified in Step 1, and one node at a time, disable the fault monitor for the HA-DB resource in question.
# cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery |
Enable the fault monitor for the HA-DB resource in question.
or
Identify the HA-DB nodes that have their roles changed after autorecovery is successful.
On all nodes that host the database, disable the fault monitor for the HA-DB resource in question.
On any one of the nodes, execute the command for each HA-DB node that needs its role changed.
# cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery |
Problem Summary: If rolling upgrade is not completed on all the nodes, the nodes that are not yet upgraded will not be able to see the IPMP groups on the upgraded nodes
Workaround: Finish upgrading all nodes on the cluster.
Problem Summary: The date field on the Advanced Filter panel of SunPlex Manager accepts only mm/dd/yyyy format. However, in non-English locale environments, the date format is different from mm/dd/yyyy; and the return date format from the Calendar panel is other than mm/dd/yyyy format.
Workaround: Type the date range in the Advanced Filer panel in mm/dd/yyyy format. Do not use the Set... button to display the calendar and choose the date.
Problem Summary: In the Japanese locale, the error messages from scrgadm are not displayed correctly. The messages contain junk characters.
Workaround: Run the system locale in English to display the error messages in English.
Problem Summary: SunPlex Manager uses the /usr/cluster/lib/cmass/ipmpgroupmanager.sh to delete IPMP groups and adapters from IPMP groups. The script updates the /etc/hostname6.adaptername file correctly to just remove the group name, but runs the following ifconfig command to unplumb the IPv6 interface :
ifconfig adaptername inet6 unplumb |
Workaround: Reboot the node to plumb up the interface. Alternatively, run the following ifconfig command on the node. This alternative workaround does not require the node to be rebooted.
ifconfig adaptername inet6 plumb up |
Problem Summary: The list of adapters displayed in the IPMP group pages is not dependent on the IP version chosen by the user. The page displays a list of all adapters that do not have groups configured. The list should be updated when the IP Version radio button is selected as follows :
If IPV4 only is selected, no IPv4-and-IPv6 adapter and no IPv6-only adapter should be listed.
If IPV6 only is selected, no IPv4-and-IPv6 adapter and no IPv4-only adapter should be listed.
If IPV4 and IPv6 is selected, no IPv6-only adapter and no IPv4-only adapter should be listed.
Workaround: After selecting the IP version, make sure you choose only the adapter from the list which is enabled for the selected IP version.
Problem Summary: The adapter list that is displayed in the IPMP group pages is dependent on the IP version the user chooses. The current SunPlex Manager has a bug that always displays a complete list of adapters regardless of the IP version. SunPlex Manager should not let the user move an adapter which is enabled for both IPv4 and IPv6 to IPv4 only.
Workaround: The user should not attempt to move an adapter configured for both IPv4 and IPv6 to IPv4 only.
Problem Summary: An attempt to configure the data service for Sun Java System Administration Server fails if the Sun Java System Administration Server is not installed. The attempt fails because the SUNW.mps resource type requires that the /etc/mps/admin/v5.2/cluster/SUNW.mps directory exists. This directory exists only if the SUNWasvr package is installed.
Workaround: To correct this problem, complete the following procedure.
Log in as root or assume an equivalent role on a cluster node.
Determine whether the SUNWasvr package is installed.
# pkginfo SUNWasvr |
If the SUNWasvr package is not installed, install the package from the Sun Cluster CD-ROM by completing the following step:
Problem Summary: As of Solaris 10, the Sun Cluster HA for NFS data service sets the property /startd/duration to transient for the Service Management Facility (SMF) services /network/nfs/server, /network/nfs/status, and /network/nfs/nlockmgr. The intention of this property setting is to cause SMF not to restart these services in the event of any failure. A bug in SMF causes SMF to restart /network/nfs/status and /network/nfs/nlockmgr after the first failure despite this property setting.
Workaround: For Sun Cluster HA for NFS to run correctly, run the following command on all nodes after creating the first Sun Cluster HA for NFS resource and before bringing the Sun Cluster HA for NFS resource online.
# pkill -9 -x 'startd|lockd' |
If you are booting Sun Cluster for the first time, run the above command on all the potential primary nodes, after creating the first Sun Cluster HA for NFS resource and before bringing the Sun Cluster HA for NFS resource online.
Problem Summary: When a node is added to a cluster, the scinstall utility checks for the presence of Network Security Services (NSS) files on the node that you are adding. These files and security keys are required by the common agent container. If the NSS files exist, the utility copies the common agent container security files from the sponsoring node to the added node. But if the sponsoring node does not have the NSS security keys installed, the copy fails and scinstall processing quits.
Workaround: Perform the following procedure to install NSS software, recreate the security keys, and restart the common agent container on the existing cluster nodes.
Perform the following procedure on all existing cluster nodes as superuser or a role that permits the appropriate access.
Have available the Sun Cluster 1 of 2 CD-ROM. The NSS packages are located at /cdrom/cdrom0/Solaris_arch/Product/shared_components/Packages/, where arch is sparc or x86 and where ver is 8 for Solaris 8, 9 for Solaris 9, or 10 for Solaris 10.
On each node, stop the Sun Web Console agent.
# /usr/sbin/smcwebserver stop |
On each node, stop the security file agent.
# /opt/SUNWcacao/bin/cacaoadm stop |
On each node, determine whether NSS packages are installed and, if so, what version.
# cat /var/sadm/pkg/SUNWtls/pkginfo | grep SUNW_PRODVERS SUNW_PRODVERS=3.9.4 |
If a version earlier than 3.9.4 is installed, remove the existing NSS packages.
# pkgrm packages |
The following table lists the applicable packages for each hardware platform.
Hardware Platform |
NSS Package Names |
---|---|
SPARC |
SUNWtls SUNWtlsu SUNWtlsx |
x86 |
SUNWtls SUNWtlsu |
On each node, if you removed NSS packages or none were installed, install the latest NSS packages from the Sun Cluster 1 of 2 CD-ROM.
Change to a directory that does not reside on the CD-ROM and eject the CD-ROM.
# eject cdrom |
On each node, create the NSS security keys.
# /opt/SUNWcacao/bin/cacaoadm create-keys |
On each node, start the security file agent.
# /opt/SUNWcacao/bin/cacaoadm start |
On each node, start the Sun Web Console agent.
# /usr/sbin/smcwebserver start |
On the node that you are adding to the cluster, restart the scinstall utility and follow procedures to install the new node.
Problem Summary: Deleting a public interface group which has both IPv4 and IPv6 enabled adapters sometimes fails when trying to delete the IPv6 adapter from the group. The following error message is displayed :
ifparse: Operation netmask not supported for inet6 /sbin/ifparse /usr/cluster/lib/cmass/ipmpgroupmanager.sh[8]: /etc/hostname.adaptname.tmpnumber: cannot open |
Workaround: Edit the/etc/hostname6.adaptername file to include the following lines:
plumb up -standby |
Run the following command on the cluster node :
ifconfig adaptername inet6 plumb up -standby |
Problem Summary: Sun Cluster software hangs when attempting to perform a rolling upgrade from Sun Cluster 3.1 9/04 software to Sun Cluster 3.1 8/05 software due to a memory problem triggered when the first upgraded node is rebooted in cluster mode.
Workaround: If you are running Sun Cluster 3.1 9/04 software or the patch equivalent (revision 09 or higher) and want to perform a Rebooting Patch procedure to upgrade to Sun Cluster 3.1 8/05 software or the patch equivalent (revision 12), you must complete the following steps before you upgrade your cluster or apply this core patch.
Choose the type of patch installation procedure that is appropriate to your availability requirements:
Rebooting Patch (Node)
Rebooting Patch (Cluster and Firmware)
These patch installation procedures are provided in Chapter 8, Patching Sun Cluster Software and Firmware, in Sun Cluster System Administration Guide for Solaris OS.
Apply one of the following patches depending on the operating system you are using:
117909-11 Sun Cluster 3.1 Core Patch for SunOS 5.9 X86
117950-11 Sun Cluster 3.1 Core Patch for Solaris 8
117949-11 Sun Cluster 3.1 Core Patch for Solaris 9
You must complete the entire patch installation procedure before upgrading to Sun Cluster 3.1 8/05 software or the patch equivalent (revision 12).
Problem Summary: Sun Cluster software installation adds exclude: lofs to /etc/system. Because lofs is critical to the function of zones, both zone install and zone boot fail.
Workaround: Before attempting to create any zones, perform the following procedure.
If you are running Sun Cluster HA for NFS, exclude from the automounter map all files that are part of the highly available local file system that is exported by the NFS server.
On each cluster node, edit the /etc/system file to remove any exclude: lofs lines.
Reboot the cluster.
Problem Summary: The Solaris 10 OS requires different recovery procedures than previous versions of the Solaris OS when a cluster file system fails to mount at boot time. Rather than present a login prompt, the mountgfsys service might fail and put the node into the maintenance state. The output messages are similar to the following:
WARNING - Unable to globally mount all filesystems. Check logs for error messages and correct the problems. May 18 14:06:58 pkaffa1 svc.startd[8]: system/cluster/mountgfsys:default misconfigured May 18 14:06:59 pkaffa1 Cluster.CCR: /usr/cluster/bin/scgdevs: Filesystem /global/.devices/node@1 is not available in /etc/mnttab. |
Workaround: After you repair the mount problem for the cluster file system, you must manually bring the mountgfsys service back online. Run the following commands to bring the mountgfsys service online and to synchronize the global devices namespace:
# svcadm clear svc:/system/cluster/mountgfsys:default # svcadm clear svc:/system/cluster/gdevsync:default |
Boot processing will now continue.
Problem Summary: Sun Cluster 3.1 8/05 software does not support upgrade to the March 2005 release of the Solaris 10 OS. An attempt to upgrade to that release might corrupt the /etc/path_to_inst file. This file corruption would prevent the node from booting successfully. The corrupted file would appear similar to the following, in that it contains duplicate entries for some of the same device names except that the physical device name contains the prefix /node@nodeid:
… "/node@nodeid/physical_device_name" instance_number "driver_binding_name" … "/physical_device_name" instance_number "driver_binding_name" |
In addition, some key Solaris services might fail to start, including networking and file-system mounting, and messages might print on the console which state that the service is misconfigured.
Workaround: Use the following procedure.
The following procedure describes how to recover from an upgrade to Solaris 10 software that results in a corrupted /etc/path_to_inst file.
This procedure does not attempt to correct any other problem that can be associated with upgrading a Sun Cluster configuration to the March 2005 release of the Solaris 10 OS.
Perform this procedure on each node that was upgraded to the March 2005 release of the Solaris 10 OS.
If a node cannot boot, boot the node from the network or from a CD-ROM. Once the node is up, run the fsck command and mount the local file system in a partition such as /a. In Step 2, use the name of the local-file-system mount in the path to the /etc directory.
Become superuser or an equivalent role on the node.
Change to the /etc directory.
# cd /etc |
Determine whether the path_to_inst file is corrupted.
The following characteristics are present if the path_to_inst file is corrupted:
The file includes a block of entries that contain /node@nodeid at the beginning of physical device names.
Some of the same entries are listed again but without the /node@nodeid prefix.
If the file is not of this format, then some other problem exists. Do not continue this procedure. Contact your Sun service representative if you need assistance.
If the path_to_inst file is corrupted as described in Step 3, run the following commands.
# cp path_to_inst path_to_inst.bak # sed -n -e "/^#/p" -e "s,node@./,,p" path_to_inst.bak > path_to_inst |
Inspect the path_to_inst file to ensure that the file is repaired.
A repaired file will reflect the following changes:
The /node@nodeid prefix is removed from all physical device names.
There are no duplicate entries for any physical device name.
Ensure that the permissions of the path_to_inst file are read only.
# ls -l /etc/path_to_inst -r--r--r-- 1 root root 2946 Aug 8 2005 path_to_inst |
Perform a reconfiguration reboot into non-cluster mode.
# reboot -- -rx |
After you repair all affected cluster nodes, go to How to Upgrade Dependency Software Before a Nonrolling Upgrade in Sun Cluster Software Installation Guide for Solaris OS to continue the upgrade process.
Problem Summary: On x86 clusters with ce transports, a node under heavy load could be halted by CMM as a result of a split-brain.
Workaround: For x86 clusters using the PCI Gigaswift Ethernet card on the private network, add the following to /etc/system:
set ce:ce_tx_ring_size=8192 |
Problem Summary: On clusters with more than two nodes, running Solaris 10, and using Hitachi storage, all of the cluster nodes might panic when a node joins or leaves the cluster.
Workaround: No current workaround exists. If you encounter this problem, contact your Sun Service provider about acquiring a patch.
Problem Summary: Application Server Enterprise Edition 8.1 cannot be installed by the Java ES 2005Q1 installer if the Configure Later option is selected. Selecting the Configure Later option installs the Platform Edition and not the Enterprise Edition.
Workaround: While installing the Application Server Enterprise Edition 8.1 using the Java ES installer, use the Configure Now option to install. Selecting the Configure Later option installs the Platform Edition only.
Problem Summary: Restart of the bind SMF service can impact Solaris Volume Manager operation. Installation of Veritas 4.1 VxVM packages causes the SMF bind service to be restarted.
Workaround: Reboot Solaris Volume Manager after either restarting the bind SMF service or after installing VxVM 4.1 on a S10 host.
svcadm restart svc:/network/rpc/scadmd:default |
Problem Summary: This problem occurs only on systems using Solaris 10. If the user uses the Java ES installer on the Sun Cluster Agents CD-ROM to install Sun Cluster data services after the Sun Cluster core has been installed, the installer fails with the following messages :
The installer has determined that you must manually remove incompatible versions of the following components before proceeding: [Sun Cluster 3.1 8/05, Sun Cluster 3.1 8/05, Sun Cluster 3.1 8/05] After you remove these components, go back. Component Required By ... 1. Sun Cluster 3.1 8/05 HA Sun Java System Message Queue : HA Sun Java System Message Queue 2. Sun Cluster 3.1 8/05 HA Sun Java System Application Server : HA Sun Java System Application Server 3. Sun Cluster 3.1 8/05 HA/Scalable Sun Java System Web Server : HA/Scalable Sun Java System Web Server 4. Select this option to go back to the component list. This process might take a few moments while the installer rechecks your system for installed components. Select a component to see the details. Press 4 to go back the product list [4] {"<" goes back, "!" exits} |
Workaround: On a system using Solaris 10, install the Sun Cluster data service manually by using pkgadd or scinstall. If the Sun Cluster data service has a dependency on shared components, install the shared components manually by using pkgadd. The following link lists the shared components for each product:
http://docs.sun.com/source/819-0062/preparing.html#wp28178
Problem Summary: During startup of Sun Web Console, the following message might be displayed.
/usr/sbin/smcwebserver:../../../../j2se/opt/javahelp/lib: does not exist |
Workaround: The message is safe to ignore. You can manually add a link in /usr/j2se/opt to point to the correct Java Help 2.0 by entering the following:
# ln -s /usr/jdk/packages/javax.help-2.0 /usr/j2se/opt/javahelp |
Problem Summary: After upgrading from the Solaris 9 OS to the Solaris 10 OS on a cluster that runs Sun Cluster 3.1 4/04 software or earlier, booting the node into noncluster mode results in the node panicking.
Workaround: Install one of the following patches before you upgrade from Solaris 9 to Solaris 10 software.
SPARC based systems: 117949-09 or higher
x86 based systems: 117909-09 or higher
Problem Summary: When using SunPlex Installer to configure Sun Cluster HA for Apache and Sun Cluster HA for NFS data services as part of Sun Cluster installation, SunPlex Installer does not create the necessary device groups and resources in the resource groups.
Workaround: Do not use SunPlex Installer to install and configure data services. Instead, follow procedures in the Sun Cluster Software Installation Guide for Solaris OS and the Sun Cluster Data Service for Apache Guide for Solaris OS or Sun Cluster Data Service for NFS Guide for Solaris OS manuals to install and configure these data services.
Problem Summary: NFSv4 is not supported in Sun Cluster 3.1 8/05.
Workaround: Solaris 10 introduces a new version of NFS protocol, NFSv4. This is the default protocol for Solaris 10 clients and server. The Sun Cluster 3.1 8/05 release supports Solaris 10, however it does not support use of NFSv4 protocol with Sun Cluster HA for NFS service on the cluster to achieve high-availability for NFS server. To make sure no NFS client can use NFSv4 protocol to talk to NFS server on Sun Cluster software, edit the /etc/default/nfs file to change the line NFS_SERVER_VERSMAX=4 to NFS_SERVER_VERSMAX=3. This would make sure that only NFSv3 protocol is used by the clients of Sun Cluster HA for NFS service on the cluster.
NOTE: Use of Solaris 10 cluster nodes as NFSv4 clients is not affected by this restriction and the above mentioned workaround. The cluster nodes can act as NFSv4 clients.
Problem Summary: The metaset command fails after the rpcbind service is restarted.
Workaround: Ensure that you are not performing any configuration operations on your Sun Cluster system, then kill the rpc.metad process using the following command:
# pkill -9 rpc.metad |
Problem Summary: When shutting down the cluster, some of the nodes may panic due to the order in which services are stopped on the nodes. If the RPC service is stopped before the RAC framework is stopped, errors may result when the SVM resource attempts to reconfigure. This results in an error being reported back to the RAC framework resulting in a node panic. This problem has been observed with Sun Cluster running the RAC framework with the SVM storage option. There should be no impact to Sun Cluster functionality.
Workaround: The panic is by design and can safely be ignored, although clean-up of the saved core files should be performed to reclaim filesystem space.
Problem Summary: In the Solaris 10 OS, the /etc/nsswitch.conf file has been modified to include NIS in the ipnodes entry.
ipnodes: files nis [NOTFOUND=return] |
This causes the address resolution to hang if NIS becomes inaccessible, either due to a NIS problem or due to failure of all public network adapters. This problem eventually causes failover resources or shared address resources to fail to fail over.
Workaround: Complete the following before you create logical host or shared address resources:
Change the ipnodes entry in the /etc/nsswitch.conf file from [NOTFOUND=return] to [TRYAGAIN=0].
ipnodes: files nis [TRYAGAIN=0] |
Ensure that all IP addresses for logical hosts and shared addresses are added to the /etc/inet/ipnodes file, in addition to the /etc/inet/hosts file.
Problem Summary: While attempting to update the Sun Cluster Data Service for Sun Java System Application Server EE from 3.1 9/04 to 3.1 8/05, scinstall does not remove the package for j2ee and displays the following message:
Skipping "SUNWscswa" - already installed |
Sun Cluster Data Service for Sun Java System Application Server EE is not upgraded.
Workaround: Manually remove and add the sap_j2ee package using the following commands :
# # pkgrm SUNWscswa # pkgadd [-d device] SUNWscswa |
Problem Summary: The NFS file system cannot be checked for viability prior to a failover or scswitch being used to locate the data service to the node. If a node doesn't have the NFS filesystem, a switch/failover to that node will result in a failure of the data service that requires manual intervention. A mechanism like HAStoragePlus is needed to check the viability of the filesystem prior to attempting the fail/switchover to that node.
Workaround: File systems using NAS filers (with entries in /etc/vfstab) are mounted outside Sun Cluster software control, and this means that Sun Cluster software is unaware of any problems. Should the file system become unavailable, some data services, such as Sun Cluster HA for Oracle, will fail when data service methods, such as START or STOP, are executed.
Failure of these methods may lead to several possibilities:
The data services resource, such as HA-Oracle, may go into the STOP_FAILED state, if the application (Oracle) binaries are not available.
The data service may continuously attempt to fail over to alternate nodes until it is able to start successfully or startup attempts have failed on all possible nodes.
Perform one of the following procedures to avoid the above problems:
Place the application binaries on either a failover or cluster file system. Then configure a HAStoragePlus resource to represent this file system and record a dependency of the application upon this resource. The system will not attempt to start the application when the file system is not available.
Place the application binaries on the local root file system. If the local root file system does not work, the node will not be able to join the cluster, and the system will not attempt to start the application on that node.
Problem Summary: The Sun Cluster data service does not restart the ma process when the data service is killed or exits abruptly.
Workaround: This is the expected behavior and the data services is not affected.
Problem Summary: Attempting to delete a resource during a rolling upgrade before all nodes are running the new software might cause one of the nodes to panic. Do not delete a resource until all nodes have the new software installed.
Workaround: During a rolling upgrade, do not delete an RGM resource until all nodes have the new software installed.
Problem Summary: The HADB database fails to restart after the cluster nodes are rebooted. The user will not be able to access the database.
Workaround: Restart one of your management data services by completing the following procedure. If the following procedure does not resolve the problem, delete the database and recreate it.
On the node to be shut down, type the following command. The -h option should not include the node name on which you want the management agent to be stopped.
scswitch -z -g hadb resource grp -h node1, node2... |
Switch the resource group back to the original node.
scswitch —Z —g hadb resource grp |
Check the status of the database. Wait until the database comes to the “stopped” state.
hadbm status -n database |
Start the database.
hadbm start database |
Problem Summary: The SUNWiimsc package in sun_cluster_agents is invalid. After adding this package, SUNW.iim in /opt/SUNWiim/cluster has size 0.
Workaround: Replace the SUNW.iim package and register again by completing the following steps.
Copy the correct SUNW.iim from the CD-ROM.
# cp 2of2_CD/Solaris_arch/Product/sun_cluster_agents/Solaris_os /Packages/SUNWiimsc/reloc/SUNWiim/cluster/SUNW.iim /opt/SUNWiim/Cluster/SUNW.iim |
Remove any existing SUNW.iim registration.
# rm /usr/cluster/lib/rgm/rtreg/SUNW.iim |
Register the data service with Sun Cluster
sh 2of2_CD/Solaris_arch/Product/sun_cluster_agents/ Solaris_os/Packages/SUNWiimsc/install/postinstall |
Problem Summary: Trying to add a new IPMP group using SunPlex Manger sometimes fails with the following message.
An error was encountered by the system. If you were performing an action when this occurred, review the current system state prior to proceeding. |
Workaround: Perform one of the following procedures depending on the version of IP you are running.
Enter the following command:
ifconfig interface inet plumb group groupname [addif address deprecated] netmask + broadcast + up -failover |
If a test address has been provided, update the /etc/hostname .interface file to add the following:
group groupname addif address netmask + broadcast + deprecated -failover up |
If a test address has not been provided, update the /etc/hostname.interface file to add the following:
group.groupname netmask + broadcast -failover up |
Entering the following command:
ifconfig interface inet6 plumb up group groupname |
Update the /etc/hostname6.interface file to add the following entries:
group groupname plumb up |
If the /etc/hostname6.interface file does not already exist, create the file and add the entries mentioned above.
Problem Summary: After bringing the resource online and panicking one of the nodes in the cluster (for example, shutdown or uadmin), the resource keeps restarting on the other nodes. The user will not be able to issue any management commands.
Workaround: To prevent this problem, log onto a single node as root or a role with equivalent access privileges and increase the probe_timeout of the resource to a value of 600 seconds, using the following command:
scrgadm -c -j hadb resource -x Probe_timeout=600 |
To verify your change, shutdown one of the cluster nodes and check to make sure the resource does not go into the degraded state.
Problem Summary: The load balancing feature of Sun Cluster scalable services does not work on Solaris 10 systems when both the public networks and Sun Cluster transports use bge-driven adapters. Platforms with built-in NICs that use bge include Sun Fire V210, V240, and V250.
Failover data services are not affected by this bug.
Workaround: Do not configure public networking and cluster transports to both use bge-driven adapters.
Problem Summary: When the SunPlex Manager default locale is set to multibyte locale, you cannot see the system log.
Workaround: Set the default locale to C or view the syslog (/var/adm/messages) manually through a command line shell
Problem Summary: The instances and node agents must be configured to listen on the failover IP address/hostname. When the node agents and Sun Java System Application Server instances are created, the physical node hostname is set by default. The HTTP IP Address and the client-hostname is changed in the domain.xml. But Domain Admin Server is not restarted so the changes do not take effect. Therefore, the node agents come up only on the physical node where they were configured, but not on the other node.
Workaround: Change the client-hostname property in the Node Agent section of domain.xml to listen on the failover IP and restart the Domain Admin Server for the changes to take effect.
Problem Summary: When using SunPlex Manager in Sun Cluster 3.1 8/05 with Cacao 1.1, only JDK 1.5.0_03 is supported.
Workaround: Manually install JDK 1.5 by completing the following procedure.
Add JDK 1.5 from JES 4 shared components directory (See JES 4 RN for instructions).
Stop cacao.
# /opt/SUNWcacao/bin/cacaoadm stop |
Start cacao.
# /opt/SUNWcacao/bin/cacaoadm start |
Problem Summary: This bug is seen on a Sun Cluster system running 3.1 (9/04) plus patches that is upgraded to Sun Cluster (8/05) by applying patch 117949-14 on a system running Solaris 9 or patch 117950-14 on a system running Solaris 8. The following error message displays once the machine boots :
# An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0xfaa90a88, pid=3102, tid=1 # # Java VM: Java HotSpot(TM) Client VM (1.5.0_01-b07 mixed mode, sharing) # Problematic frame: # C [libcmas_common.so+0xa88] newStringArray+0x70 # # An error report file with more information is saved as /tmp/hs_err_pid3102.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # |
Workaround: When upgrading from Sun Cluster 3.1 (9/04) to Sun Cluster 3.1 (8/05), install the SPM patch in addition to the core patch by entering the following command.
On a system running Solaris 8, run the following command after applying core patch 117950-14:
patchadd patchdir/118626-04 |
On a system running Solaris 9, run the following command after patch 117949-14 has been applied:
patchadd patchdir/118627-04 |
Problem Summary: The resource registration sometimes fails for Directory Server and Administration Server. The system will display the following message:
Registration file not found for "SUNW.mps" in /usr/cluster/lib/rgm/rtreg |
Workaround: Register the missing file from the pkg location directly by entering one of the following commands:
For Directory Server, enter the following command from the pkg location:
- scrgadm -a -t SUNW.dsldap -f /etc/ds/v5.2/cluster/SUNW.dsldap |
For Administration Server, enter the following command from the pkg location:
- scrgadm -a -t SUNW.mps -f /etc/mps/admin/v5.2/cluster/SUNW.mps |
Problem Summary: If a Sun Cluster node running Solaris 10 does not have IPv6 interfaces configured for public networking (for example, not for cluster interconnects), it cannot access machines that have both an IPv4 and IPv6 address mapping in a name service, such as NIS. Applications such as telnet and traceroot that choose the IPv6 address over IPv4 will see their packets getting sent to the cluster transport adaptors and dropped.
Workaround: Use one of the following workarounds depending on the configuration or your cluster.
If IPv6 is not required to run on the cluster, then remove the nis entry in the ipnodes line in /etc/nsswitch.conf. For example, change the ipnodes line to the following:
ipnodes files # Work Around for CR 6306113 |
If IPv6 is required, but no scalable service is running on the cluster, add the following line to /etc/system and reboot all nodes.
set clcomm:ifk_disable_v6=1 |
If IPv6 scalable service is running, make sure all cluster nodes have an IPv6 network interface configured for public networking (non-cluster use). See ifconfig(1M)) and System Administration Guide: IP Servicesfor how to deploy IPv6 with Solaris.