The following known issues and bugs affect the operation of the Sun Cluster 3.2 release. Bugs and issues are grouped into the following categories:
Problem Summary: The -clnode remove --force command should remove nodes from the metasets. The Sun Cluster System Administration Guide for Solaris OS provides procedures for removing a node from the cluster. These procedures instruct the user to run the metaset command for the Solaris Volume Manager disk set removal prior to running clnode remove.
Workaround: If the procedures were not followed, it might be necessary to clear the stale node data from the CCR in the usual way: From an active cluster node, use the metaset command to clear the node from the Solaris Volume Manager disk sets. Then run clnode clear --force obsolete_nodename.
Problem Summary: On a cluster installed with the Solaris 10 End User software group, SUNWCuser, running the scsnapshot command might fail with the following error:
# scsnapshot -o … /usr/cluster/bin/scsnapshot[228]: /usr/perl5/5.6.1/bin/perl: not found |
Workaround: Do either of the following:
Install the Solaris Entire Distribution software group.
Install the following Perl packages: SUNWpl5u, SUNWpl5v, SUNWpl5p.
Problem Summary: The Auxnodelist property of the shared-address resource cannot be used during shared-address resource creation. This will cause validation errors and SEGV when the scalable resource that depends on this shared address network resource is created. The scalable resource's validate error message is in the following format:
Method methodname (scalable svc) on resource resourcename stopped or terminated due to receipt of signal 11 |
Also, the core file is generated from ssm_wrapper. Users will not be able to set the Auxnodelist property and thus cannot identify the cluster nodes that can host the shared address but never serve as primary.
Workaround: On one node, re-create the shared-address resource without specifying the Auxnodelist property. Then rerun the scalable-resource creation command and use the shared-address resource that you re-created as the network resource.
Problem Summary: The Quorum Server command clquorumserver does not set the state for the startup mechanism correctly for the next reboot.
Workaround: Perform the following tasks to start or stop Quorum Server software.
Display the status of the quorumserver service.
# svcs -a | grep quorumserver |
If the service is disabled, output appears similar to the following:
disabled 3:33:45 svc:/system/cluster/quorumserver:default |
Start Quorum Server software.
If the quorumserver service is disabled, use the svcadm enable command.
# svcadm enable svc:/system/cluster/quorumserver:default |
If the quorumserver service is online, use the clquorumserver command.
# clquorumserver start + |
Disable the quorumserver service.
# svcadm disable svc:/system/cluster/quorumserver:default |
Start Quorum Server software.
# clquorumserver start + |
Rename the /etc/rc2.d/.S99quorumserver file as /etc/rc2.d/S99quorumserver.
# mv /etc/rc2.d/.S99quorumserver /etc/rc2.d/S99quorumserver |
Stop Quorum Server software.
# clquorumserver stop + |
Start Quorum Server software.
# mv /etc/rc2.d/S99quorumserver /etc/rc2.d/.S99quorumserver |
Problem Summary: When creating the node agent (NA) resource in Sun Cluster HA for Application Server, the resource gets created even if there is no dependency set on the DAS resource. The command should error out if the dependency is not set, because a DAS resource must be online in order to start the NA resource.
Workaround: While creating the NA resource, make sure you set a resource dependency on the DAS resource.
Problem Summary: The HA MySQL patch adds a new variable called MYSQL_DATADIR in the mysql_config file. This new variable must point to the directory where the MySQL configuration file my.conf file is stored. If this variable is not configured correctly, the database preparation with mysql_register will fail.
Workaround: Point the MYSQL_DATADIR variable to the directory where the MySQL configuration file, my.conf is stored.
Problem Summary: If InfiniBand is used as the cluster transport and there are two adapters on each node with two ports per adapter and a total of two switches, the scinstall utility's adapter autodiscovery could suggest two transport paths that use the same adapter.
Workaround: Manually specify the transport adapters on each node.
Problem Summary: IPv6 plumbing on the interconnects, which is required for forwarding of IPv6 scalable service packets, will no longer be enabled by default. The IPv6 interfaces, as seen when using the ifconfig command, will no longer be plumbed on the interconnect adapters by default.
Workaround: Manually enable IPv6 scalable service support.
Ensure that you have prepared all cluster nodes to run IPv6 services. These tasks include proper configuration of network interfaces, server/client application software, name services, and routing infrastructure. Failure to do so might result in unexpected failures of network applications. For more information, see your Solaris system-administration documentation for IPv6 services.
On each node, add the following entry to the /etc/system file.
set cl_comm:ifk_disable_v6=0 |
On each node, enable IPv6 plumbing on the interconnect adapters.
# /usr/cluster/lib/sc/config_ipv6 |
The config_ipv6 utility brings up an IPv6 interface on all cluster interconnect adapters that have a link-local address. The utility enables proper forwarding of IPv6 scalable service packets over the interconnects.
Alternately, you can reboot each cluster node to activate the configuration change.
Problem Summary: If the clnode add command is attempted using an XML file that is using direct-connect transport, the command misinterprets the cable information and adds the wrong configuration information. As a result, the joining node is not able to join the cluster.
Workaround: Use the scinstall command to add a node to the cluster when the cluster transport is directly connected.
Problem Summary: The scinstall command updates the /etc/nsswitch.conf file to add the cluster entry for the hosts and netmasks databases. This change updates the /net/nsswitch.conf file for the global zone. But when a non-global zone is created and installed, the non-global zone receives its own copy of the /etc/nsswitch.conf file. The /etc/nsswitch.conf files on the non-global zones will not have the cluster entry for the hosts and netmasks databases. Any attempt to resolve cluster-specific private hostnames and IP addresses from within a non-global zone by using getXbyY queries will fail.
Workaround: Manually update the /etc/nsswitch.conf file for non-global zones with the cluster entry for the hosts and netmasks database. This ensures that the cluster-specific private-hostname and IP-address resolutions are available within non-global zones.
Problem Summary: Translated messages for the Quorum Server administration programs, such as clquorumserver, are delivered as part of the core translation packages. As a result, Quorum Server messages appear only in English. The Quorum server translation packages must be separated from the core translation packages and installed on the quorum server system.
Workaround: Install the following packages on the host where Quorum Server software is installed:
SUNWcsc (Simplified Chinese)
SUNWdsc (German)
SUNWesc (Spanish)
SUNWfsc (French)
SUNWhsc (Traditional Chinese)
SUNWjsc (Japanese)
SUNWksc (Korean)
If the Japanese man page is needed on the quorum server, install the SUNWjscman (Japanese man page) package.
Problem Summary: The Sun Cluster 3.2 installer displays a warning message about short swap when installing the Sun Cluster 3.2 Simplified Chinese version of the software. The installer provides an incorrect swap size of 0.0KB size on the system requirements check screen.
Workaround: If the swap size is larger than the system requirement, you can safely ignore this problem. The SC 3.2 installer on the C or English locale can be used for installation and this version checks swap size correctly.
Problem Summary: The cleanipc fails if the runtime linking environment does not contain the /sapmnt/SAPSID/exe path.
Workaround: As the Solaris root user, add the /sapmnt/SAPSID/exe path to the default library in the ld.config file.
To configure the runtime linking environment default library path for 32–bit applications, enter the following command:
# crle -u -l /sapmnt/SAPSID/exe |
To configure the runtime linking environment default library path for 64–bit applications, enter the following command:
# crle -64 -u -l /sapmnt/SAPSID/exe |
Problem Summary: When a cluster shutdown is performed, the UCMMD can go into a reconfiguration on one or more of the nodes if one of the nodes leaves the cluster slightly ahead of the UCMMD. When this occurs, the shutdown stops the rpc.md command on the node while the UCMMD is trying to perform the return step. In the return step, the metaclust command gets an RPC timeout and exits the step with an error, due to the missing rpc.mdcommd process. This error causes the UCMMD to abort the node, which might cause the node to panic.
Workaround: You can safely ignore this problem. When the node boots back up, Sun Cluster software detects this condition and allows the UCMMD to start, despite the fact that an error occurred in the previous reconfiguration.
Problem Summary: Sun Cluster resource validation does not accept the hostname for IPMP groups for the netiflist property during logical-hostname or shared-address resource creation.
Workaround: Use the node ID instead of the node name when you specify the IPMP group names during logical-hostname and shared-address resource creation.
Problem Summary: This problem is seen when the original disk is root encapsulated and a live upgrade is attempted from VxVM 3.5 on Solaris 9 8/03 OS to VxVM 5.0 on Solaris 10 6/06 OS. The vxlufinish script fails with the following error.
#./vslufinish -u 5.10 VERITAS Volume Manager VxVM 5.0 Live Upgrade finish on the Solairs release <5.10> Enter the name of the alternate root diskgroup: altrootdg ld.so.1: vxparms: fatal: libvxscsi.so: open failed: No such file or directory ld.so.1: vxparms: fatal: libvxscsi.so: open failed: No such file or directory Killed ld.so.1: ugettxt: fatal: libvxscsi.so: open failed: No such file or directory ERROR:vxlufinish Failed: /altroot.5.10/usr/lib/vxvm/bin/vxencap -d -C 10176 -c -p 5555 -g -g altrootdg rootdisk=c0t1d0s2 Please install, if 5.0 or higher version of VxVM is not installed on alternate bootdisk. |
Workaround: Use the standard upgrade or dual-partition upgrade method instead.
Contact Sun support or your Sun representative to learn whether Sun Cluster 3.2 Live Upgrade support for VxVM 5.0 becomes available at a later date.
Problem Summary: During live upgrade, the lucreate and luupgrade commands fail to change the DID names in the alternate boot environment that corresponds to the /global/.devices/node@N entry.
Workaround: Before you start the live upgrade, perform the following steps on each cluster node.
Become superuser.
Back up the /etc/vfstab file.
# cp /etc/vfstab /etc/vfstab.old |
Open the /etc/vfstab file for editing.
Locate the line that corresponds to /global/.device/node@N.
Edit the global device entry.
Change the DID names to the physical names.
Change /dev/did/{r}dsk/dYsZ to /dev/{r}dsk/cNtXdYsZ.
Remove global from the entry.
The following example shows the name of DID device d3s3 which corresponds to /global/.devices/node@s, changed to its physical device names and the global entry removed:
Original: /dev/did/dsk/d3s3 /dev/did/rdsk/d3s3 /global/.devices/node@2 ufs 2 no global Changed: dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /global/.devices/node@2 ufs 2 no - |
When the /etc/vfstab file is modified on all cluster nodes, perform live upgrade of the cluster, but stop before you reboot from the upgraded alternate boot environment.
On each node, on the current, unupgraded, boot environment, restore the original /etc/vfstab file.
# cp /etc/vstab.old /etc/vfstab |
In the alternate boot environment, open the /etc/vfstab file for editing.
Locate the line that corresponds to /global/.devices/node@N and replace the dash (-) at to the end of the entry with the word global.
/dev/dsk/cNtXdYsZ /dev/rdsk/cNtXdYsZ /global/.devices/node@N ufs 2 no global |
Reboot the node from the upgraded alternate boot environment.
The DID names are substituted in the /etc/vfstab file automatically.
Problem Summary: This problem is seen when upgrading VERITAS Volume Manager (VxVM) during a Sun Cluster live upgrade. The vxlustart script is used to upgrade the Solaris OS and VxVM from the previous version. The script fails with error messages similar to the following:
# ./vxlustart -u 5.10 -d c0t1d0 -s OSimage VERITAS Volume Manager VxVM 5.0. Live Upgrade is now upgrading from 5.9 to <5.10> … ERROR: Unable to copy file systems from boot environment <sorce.8876> to BE <dest.8876>. ERROR: Unable to populate file systems on boot environment <dest.8876>. ERROR: Cannot make file systems for boot environment <dest.8876>. ERROR: vxlustart: Failed: lucreate -c sorce.8876 -C /dev/dsk/c0t0d0s2 -m -:/dev/dsk/c0t1d0s1:swap -m /:/dev/dsk/c0t1d0s0:ufs -m /globaldevices:/dev/dsk/c0t1d0s3:ufs -m /mc_metadb:/dev/dsk/c0t1d0s7:ufs -m /space:/dev/dsk/c0t1d0s4:ufs -n dest.8876 |
Workaround: Use the standard upgrade or dual-partition upgrade method if you are upgrading the cluster to VxVM 5.0.
Contact Sun support or your Sun representative to learn whether Sun Cluster 3.2 Live Upgrade support for VxVM 5.0 becomes available at a later date.
Problem Summary: For clusters that run VERITAS Volume Manager (VxVM), a standard upgrade or dual-partition upgrade of any of the following software fails if the root disk is encapsulated:
Upgrading the Solaris OS to a different version
Upgrading VxVM
Upgrading Sun Cluster software
The cluster node panics and fails to boot after upgrade. This is due to the major-number or minor-number changes made by VxVM during the upgrade.
Workaround: Unencapsulate the root disk before you begin the upgrade.
If the above procedure is not followed correctly, you may experience serious unexpected problems on all nodes being upgraded. Also, unencapsulation and encapsulation of root disk causes an additional reboot (each time) of the node automatically, increasing the number of required reboots during upgrade.
Problem Summary: Following a live upgrade from Sun Cluster version 3.1 on Solaris 9 to version 3.2 on Solaris 10, zones cannot be used properly with the cluster software. The problem is that the pspool data is not created for the Sun Cluster packages. So those packages that must be propagated to the non-global zones, such as SUNWsczu, are not propagated correctly.
Workaround: After the Sun Cluster packages have been upgraded by using the scinstall -R command but before the cluster has booted into cluster mode, run the following script twice:
Once for the Sun Cluster framework packages
Once for the Sun Cluster data-service packages
Prepare and run this script in one of the following ways:
Set up the variables for the Sun Cluster framework packages and run the script. Then modify the PATHNAME variable for the data service packages and rerun the script.
Create two scripts, one with variables set in the script for the framework packages and one with variables set for the data service packages. Then run both scripts.
Become superuser.
Create a script with the following content.
#!/bin/ksh typeset PLATFORM=${PLATFORM:-`uname -p`} typeset PATHNAME=${PATHNAME:-/cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster/Solaris_10/Packages} typeset BASEDIR=${BASEDIR:-/} cd $PATHNAME for i in * do if pkginfo -R ${BASEDIR} $i >/dev/null 2>&1 then mkdir -p ${BASEDIR}/var/sadm/pkg/$i/save/pspool pkgadd -d . -R ${BASEDIR} -s ${BASEDIR}/var/sadm/pkg/$i/save/pspool $i fi done
Set the variables PLATFORM, PATHNAME, and BASEDIR.
Either set these variables as environment variables or modify the values in the script directly.
The name of the platform. For example, it could be sparc or x86. By default, the PLATFORM variable is set to the output of the uname -p command.
A path to the device from where the Sun Cluster framework or data-service packages can be installed. This value corresponds to the -d option in the pkgadd command.
As an example, for Sun Cluster framework packages, this value would be of the following form:
/cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster/Solaris_10/Packages |
For the data services packages, this value would be of the following form:
/cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster_agents/Solaris_10/Packages |
The full path name of a directory to use as the root path and corresponds to the -R option in the pkgadd command. For live upgrade, set this value to the root path that is used with the -R option in the scinstall command. By default, the BASEDIR variable is set to the root (/) file system.
Run the script, once for the Sun Cluster framework packages and once for the data-service packages.
After the script is run, you should see the following message at the command prompt for each package:
Transferring pkgname package instance |
If the pspool directory already exists for a package or if the script is run twice for the same set of packages, the following error is displayed at the command prompt:
Transferring pkgname package instance pkgadd: ERROR: unable to complete package transfer - identical version of pkgname already exists on destination device |
This is a harmless message and can be safely ignored.
After you run the script for both framework packages and data-service packages, boot your nodes into cluster mode.
Problem Summary: Adding a new cluster node without ensuring that the node has the same patches as the existing cluster nodes might cause the cluster nodes to panic.
Workaround: Before adding nodes to the cluster, ensure that the new node is first patched to the same level as the existing cluster nodes. Failure to do this might cause the cluster nodes to panic.