The following known issues and bugs affect the operation of the Sun Cluster 3.1 9/04 release.
Problem Summary: scvxinstall creates incorrect vfstab entries when boot device is multipathed.
Workaround: Run scvxinstall and choose to encapsulate. When the following message appears, type Ctrl-C to abort the reboot:
This node will be re-booted in 20 seconds. Type Ctrl-C to abort. |
Edit the vfstab entry so /global/.devices uses the /dev/{r}dsk/cXtXdX name instead of the /dev/did/{r}dsk name. This revised entry enables VxVM to recognize it as the rootdisk. Rerun scvxinstall and choose to encapsulate. The vfstab file has the necessary updates. Allow the reboot to occur. The encapsulation proceeds as normal.
Problem Summary: The Sun Cluster for HA for Oracle data service uses the su command to start and stop the database. If you are running Solaris 8 or Solaris 9, the network service might become unavailable when a cluster node's public network fails.
Workaround: Include the following entries in the /etc/nsswitch.conf file on each node that can be the primary for oracle_server resource or oracle_listener resource:
passwd: files groups: files publickey: files project: files
These entries ensure that the su command does not refer to the NIS/NIS+ name services, so that the data service starts and stops correctly during a network failure.
Problem Summary: Clusters that use ce adapters on the private interconnect observe path timeouts and subsequent node panics if one or more cluster nodes have more than 4 CPUs.
Workaround: Set the ce_taskq_disable parameter in the ce driver by adding the following line to /etc/system file on all cluster nodes.
set ce:ce_taskq_disable=1
Then, reboot the cluster nodes. Consider quorum when you reboot cluster nodes. Setting this parameter ensures that heartbeats (and other packets) are always delivered in the interrupt context thereby eliminating the path timeouts and the subsequent panics.
Problem Summary: The Sun Cluster HA for SAP liveCache data service uses the dbmcli command to start and stop liveCache. If you are running Solaris 9, the network service might become unavailable when a cluster node's public network fails.
Workaround: Include one of the following entries for the publickey database in the /etc/nsswitch.conf files on each node that can be the primary for liveCache resources:
publickey: publickey: files publickey: files [NOTFOUND=return] nis publickey: files [NOTFOUND=return] nisplus
Adding one of the above entries, in addition to updates documented in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS, ensures that the su command and the dbmcli command do not refer to the NIS/NIS+ name services. Bypassing the NIS/NIS+ name services ensures that the data service starts and stops correctly during a network failure.
Problem Summary: Due to an internal error, some Sun-supplied cluster agents write messages to the system log (see syslog(3C)) using the LOG_USER facility instead of using LOG_DAEMON. On a cluster that is configured with the default syslog settings (see syslog.conf(4)), messages with a severity of LOG_WARNING or LOG_NOTICE, which would normally be written to the system log, are not being output. This problem occurs only for agent code written as shell scripts.
Workaround:
The following workaround is for agent developers writing shell scripts:
In shell scripts, pass the facility explicitly to scds_sylog:
facility=`scha_cluster_get -O SYSLOG_FACILITY
'scds_syslog -p ${facility}.error -m "error message"
The following workaround is for cluster administrators:
Add the following entry near the front of the /etc/syslog.conf file on all cluster nodes:
user.warning /var/adm/messages
This entry causes user.warning messages to be logged. You can add a similar entry for user.notice messages, but this is not necessary and might cause the logs to fill too quickly, depending on the mix of applications that are running.
Problem Summary: The requirement for the nsswitch.conf file in "Preparing the Nodes and Disks" in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS does not apply to the entry for the passwd database. If these requirements are met, the su command might hang on each node that can master the liveCache resource when the public network is down.
Workaround: On each node that can master the liveCache resource, ensure that the entry in the /etc/nsswitch.conf file for the passwd database is as follows:
passwd: files nis [TRYAGAIN=0]
Problem Summary: sccheck might hang if launched simultaneously from multiple nodes.
Workaround: Do not launch sccheck from any multi-console that passes commands to multiple nodes. sccheck runs can overlap, but should not be launched simultaneously.
Problem Summary: Currently, HA-DB data service does not use the JAVA_HOME environment variable. Therefore, HA-DB, when invoked from the HA-DB data service, takes Java binaries from /usr/bin/. The Java binaries in /usr/bin/ need to be linked to the appropriate version of Java 1.4 and above for HA-DB data service to work properly.
Workaround: If you do not object to changing the default version available, perform the following procedure. As an example, this workaround assumes that the /usr/j2se directory is where you have the latest version of Java (such as 1.4 and above).
Do you currently have a directory called java/ in the /usr/ directory? If so, move it to a temporary location.
From the /usr/ directory, link /usr/bin/java and all other Java-related binaries to the appropriate version of Java.
# ln -s j2se java |
If you do not want to change the default version available, assign the JAVA_HOME environment variable with the appropriate version of Java (J2SE 1.4 and above) in the /opt/SUNWappserver7/SUNWhadb/4/bin/hadbm script.
Problem Summary: Due to bug 4974875, whenever autorecovery is performed, the database reinitializes itself without any spares. The mentioned bug has been fixed and integrated into HA-DB release 4.3. For HA-DB 4.2 and below releases, follow one of the procedures below to change the roles of the HA-DB nodes.
Workaround:
Identify the HA-DB nodes that have their roles changed after autorecovery is successful.
On all the nodes that you identified in Step 1, and one node at a time, disable the fault monitor for the HA-DB resource in question.
# cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery |
Enable the fault monitor for the HA-DB resource in question.
or
Identify the HA-DB nodes that have their roles changed after autorecovery is successful.
On all nodes that host the database, disable the fault monitor for the HA-DB resource in question.
On any one of the nodes, execute the command for each HA-DB node that needs its role changed.
# cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery |
Problem Summary: During a rolling upgrade, if scstat -i command is run on a cluster node that has not yet been upgraded, the scstat output will not show the status of the IPMP groups hosted on the nodes that have already been upgraded.
Workaround: Use the scstat -i ouput from the upgraded nodes.
Problem Summary: A LogicalHostname resource cannot be added to the cluster if it needs to use an IPMP group with a failed adapter.
Workaround: Either remove the failed adapter from the IPMP group, or correct the failure before attempting to use the IPMP group in a LogicalHostname resource.
Problem Summary: The two fields, Status and Type, in the resource group status page displays values in the first locale that was used to view the page.
Workaround: To see values in a different locale, restart the web server.
Problem Summary: After encapsulating the root disk, if you unencapsulate and then reencapsulate the root disk, you might see that a volume called uservol is used for the /global/devices/node@nodeID filesystem. This might cause problems, since the volume name for each node's global devices file system should be unique.
Workaround: After following the documented steps for unencapsulation, kill the vxconfigd daemon before you run scvxinstall again to reencapsulate the root disk.
Problem Summary: When logging in to Sun Web Console, if the Login or Enter button is pressed repeatedly, the multiple login requests can result in various failures, thereby preventing access to SunPlex Manager.
Workaround: Become superuser on the cluster node and restart Sun Web Console.
# /usr/sbin/smcwebserver restart |
Problem Summary: The Resource_dependencies_restart resource property does not behave as expected when a resource declares an any node inter-resource–group restart dependency upon a scalable mode resource. Most data services are unaffected.
Background on inter-resource–group dependencies and restart dependencies:
With the inter-resource–group dependencies feature in Sun Cluster 3.1 9/04, Sun Cluster software supports resource dependencies that can cross resource group boundaries. Sun Cluster software also supports a new type of resource dependency, the restart dependency. If the dependent resource is online, the restart dependency causes the dependent resource to be restarted automatically when the depended-on resource starts.
Background on local node vs. any node dependencies:
If resource r1 in group RG1 has a dependency on r2 in RG2, and if RG1 has a positive affinity for RG2, and if both RG1 and RG2 are starting or stopping simultaneously on the same node, then the dependency of r1 on r2 is a local node dependency. For example, while starting RG1 and RG2 on the same node, r1 waits for r2 to start on that node before r1 starts on that same node. The state of r2 on other nodes does not influence when r1 starts.
However, if RG1 does not declare a positive affinity for RG2, or if there is a weak positive affinity, but the resource groups start on different nodes, then the dependency of r1 on r2 is an any node dependency. This dependency means that r1 starts as soon as r2 has started on any node.
Description of Problem:
The problem arises when resource group RG2 is a scalable mode (i.e. multi-mastered) resource group, and the dependency of r1 on r2 is an any node restart dependency. r1 is restarted every time that any instance of r2 starts. r1 should be restarted only upon the first instance of r2 that starts.
Workaround: The current behavior of restart dependencies will change as described above, when this bug is fixed. Do not develop code or administrative procedures that depend upon the current incorrect behavior.
Problem Summary: If you have a Sun Enterprise 15000 server and you run the sccheck command, the check fails and reports an error that indicates that the Sun Enterprise 15000 server is not supported. This statement is not correct.
Workaround: No workaround is necessary. Sun Cluster software supports your Sun Enterprise 15000 server. The error that the sccheck command reports states that the check might be out of date. In this case, sccheck is out of date.
Problem Summary: French (fr) is not available as a language selection for data-service agents that are not part of the Sun Java Enterprise System. However, the GUI installer for those packages suggests otherwise.
Workaround: Ignore the inaccuracy of the GUI installer. French (fr) is not available.
Problem Summary: During upgrade to Sun Cluster 3.1 9/04 software, the scinstall command installs the new common agent container packages, SUNWcacao and SUNWcacaocfg, but does not distribute identical security keys to all cluster nodes.
Workaround: Perform the following steps to ensure that the common agent container security files are identical on all cluster nodes and that the copied files retain the correct file permissions. These files are required by Sun Cluster software.
On one cluster node, change to the /etc/opt/SUNWcacao/ directory.
phys-schost-1# cd /etc/opt/SUNWcacao/ |
Create a tar file of the /etc/opt/SUNWcacao/security/ directory.
phys-schost-1# tar cf /tmp/SECURITY.tar security |
Copy the /tmp/SECURITY.tar file to each of the other cluster nodes.
On each node to which you copied the /tmp/SECURITY.tar file, extract the security files.
Any security files that already exist in the /etc/opt/SUNWcacao/ directory are overwritten.
phys-schost-2# cd /etc/opt/SUNWcacao/ phys-schost-2# tar xf /tmp/SECURITY.tar |
Delete the /tmp/SECURITY.tar file from each node in the cluster.
You must delete each copy of the tar file to avoid security risks.
phys-schost-1# rm /tmp/SECURITY.tar phys-schost-2# rm /tmp/SECURITY.tar |
On each node, restart the security file agent.
# /opt/SUNWcacao/bin/cacaoadm start |
Problem Summary: The date field on the Advanced Filter panel of SunPlex Manager accepts only mm/dd/yyyy format. However, in non-English locale environments, the date format is different from mm/dd/yyyy, and the return date format from Calendar panel is other than mm/dd/yyyy format.
Workaround: Type the date range in the Advanced Filer panel in mm/dd/yyyy format. Do not use the Set button to display the calendar and choose the date.
Problem Summary: When you remove a resource group by using SunPlex Manager on Solaris 8, you might receive error messages that are not readable. This problem occurs in Japanese, Korean, Traditional Chinese, and Simplified Chinese.
Workaround: Run system locale in English to display the error messages in English.
Problem Summary: In the resource type registration (RTR) file SUNW.sapscs, descriptions for two extension properties are incorrect.
Workaround: The description for Scs_Startup_Script should be Startup script for the SCS. Defaults to /usr/sap/SAP_SID/SYS/exe/run/startsap. The description for Scs_Shutdown_Script should be Shutdown script for the SCS. Defaults to /usr/sap/SAP_SID/SYS/exe/run/stopsap.
Problem Summary: After installing Sun Cluster software by using the JumpStart method, Sun Web Console cannot launch SunPlex Manager. JumpStart postinstallation processing fails to successfully register SunPlex Manager. with Sun Web Console.
Workaround: Run the following script on each cluster node, after JumpStart installation of Sun Cluster software is finished on all nodes.
# /var/sadm/pkg/SUNWscspmu/install/postinstall |
This script registers SunPlex Manager with Sun Web Console.
Problem Summary: The installer program on the Sun Cluster 3.1 9/04data services CD-ROM for x86 cannot be used to install HA Oracle. The following message is issued by the installer:
Could not find child archive ....
Workaround: Use scinstall to install Sun Cluster Data Service for HA Oracle.
Problem Summary: The data services for the following applications cannot be upgraded by using the scinstall utility:
Apache Tomcat
DHCP
mySQL
Oracle E-Business Suite
Samba
SWIFTAlliance Access
WebLogic Server
WebSphere MQ
WebSphere MQ Integrator
Workaround: If you plan to upgrade a data service for an application in the preceding list, replace the step for upgrading data services in Upgrading to Sun Cluster 3.1 9/04 Software (Rolling) in Sun Cluster Software Installation Guide for Solaris OS with the steps that follow. Perform these steps for each node where the data service is installed.
Remove the software package for the data service that you are upgrading.
# pkgrm pkg-inst |
pkg-inst specifies the software package name for the data service that you are upgrading as listed in the following table.
Application |
Data Service Software Package |
---|---|
Apache Tomcat |
SUNWsctomcat |
DHCP |
SUNWscdhc |
mySQL |
SUNWscmys |
Oracle E-Business Suite |
SUNWscebs |
Samba |
SUNWscsmb |
SWIFTAlliance Access |
SUNWscsaa |
WebLogic Server (English locale) |
SUNWscwls |
WebLogic Server (French locale) |
SUNWfscwls |
WebLogic Server (Japanese locale) |
SUNWjscwls |
WebSphere MQ |
SUNWscmqs |
WebSphere MQ Integrator |
SUNWscmqi |
Install the software package for the version of the data service to which you are upgrading.
To install the software package, follow the instructions in the Sun Cluster documentation for the data service that you are upgrading. This documentation is available at http://docs.sun.com/.