Sun Cluster 3.0 12/01 Release Notes

Known Problems

The following known problems affect the operation of the Sun Cluster 3.0 12/01 release. For the most current information, see the online Sun Cluster 3.0 12/01 Release Notes Supplement at http://docs.sun.com.

BugId 4419214

Problem Summary: The /etc/mnttab file does not show the most current largefile status of a globally mounted VxFS file system.

Workaround: Use the fsadm command, rather than using the /etc/mnttab entry, to verify file system largefile status.

BugId 4449437

Problem Summary: Global VxFS appears to allocate more disk blocks for the same filesize than Local VxFS. You observe this using the ls -ls command.

Workaround: Unmount and mount the file system. This results in the elimination of the extra disk blocks reported as allocated.

BugId 4490386

Problem Summary: When using Sun Enterprise 10000 servers in a cluster, panics have been observed in these servers when a certain configuration of I/O cards is used.

Workaround: Do not install UDWIS I/O cards in slot 0 of an SBus I/O board in Sun Enterprise 10000 servers in a cluster.

BugId 4492010

Problem Summary: In an N-node cluster configured with N interaction managers, if you bring down or halt the cluster node running an Interaction Manager (IM) that serves a client, you will cause the clients to lose the session. Further retries by the same client to reconnect to a different IM will take a long time. This is an issue with the BroadVision product and Broadvision engineers are working to resolve this problem. BroadVision does not support IM session failover capability.

Workaround: From a Netscape browser, click on the Stop/Reload button, and then click on Start Broadway Application button. The connection to BroadVision server should respond immediately. This workaround works most of the time for new connections, after halting the IM node. This workaround is less likely to work if you perform this workaround before halting the IM node. If this workaround does not work, clear the disk cache and memory cache in Netscape.

BugId 4493025

Problem Summary: In a two-node cluster if you switch oracle-rg from Node 1 to Node 2, BroadVision One-To-One tries three times before it successfully registers a new user. The first try displays Fail to create new user. The second try displays copyright information. The third try succeeds with no problem. This problem occurs in any N node cluster running a failover Oracle database, either within the cluster or outside the cluster, and in a two node cluster where Node 1 is the primary for http, oracle, roothost, backend, and backend2 and where Interaction Manager (IM) is running as scalable.

The problem is the new user's name is not displayed on the welcome page after login. This is a known issue with BroadVision One-To-One. There is a bug filed against BroadVision One-To-One to fix this problem: BVNqa20753.

Workaround: There is no workaround. The user will be created after three attempts.

BugId 4494165

Problem Summary: VERITAS File System patch 110435-05 changes the default logging option for mount_vxfs from the log option to the delaylog option. Logging is necessary for VxFS support on Sun Cluster.

Workaround: Manually add the log option to the VxFS options list in the vfstab file.

BugId 4499573

Problem Summary: When using data services that are I/O intensive and that are configured on a large number of disk in the cluster, the data services may time out because of the retries within the I/O subsystem during disk failures.

Workaround: Increase your data service's resource extension property value for Probe_timeout. If you need help determining the timeout value, contact your service representative.

# scrgadm -c -j resource -x Probe_timeout=timeout_value

BugId 4501655

Problem Summary: Record locking does not work correctly when the device to be locked is a global device (/dev/global/rdsk/d4s0). However, record locking works correctly when the program runs multiple times in the background on a specified node. After the first copy of the program locks a portion of the device, other copies of the program should block waiting for the device to be unlocked. However, when the program runs from a different node other than the node specified, the program locks the device again although it shouldblock waiting for the device to be unlocked.

Workaround: There is no workaround.

BugId 4504311

Problem Summary: When a Sun Cluster configuration is upgraded to Solaris 8 10/01 software (required for Sun Cluster 3.0 12/01 upgrade), the Apache start and stop scripts are restored. If an Apache data service is already present on the cluster and configured in its default configuration (the /etc/apache/httpd.conf file exists and the /etc/rc3.d/S50apache file does not exist), Apache starts on its own. This prevents the Apache data service from starting because Apache is already running.

Workaround: Do the following for each node.

Before you shut down a node to upgrade it, determine whether the following links already exist, and if so, whether the file names contain an uppercase K or S.
/etc/rc0.d/K16apache /etc/rc1.d/K16apache /etc/rc2.d/K16apache /etc/rc3.d/S50apache /etc/rcS.d/K16apache
If these links already exist and contain an uppercase K or S in the file name, no further action is necessary. Otherwise, perform the action in the next step after you upgrade the node to Solaris 8 10/01 software.

After the node is upgraded to Solaris 8 10/01 software, but before you reboot the node, move aside the restored Apache links by renaming the files with a lowercase k or s.

# mv /a/etc/rc0.d/K16apache /a/etc/rc0.d/k16apache
# mv /a/etc/rc1.d/K16apache /a/etc/rc1.d/k16apache
# mv /a/etc/rc2.d/K16apache /a/etc/rc2.d/k16apache
# mv /a/etc/rc3.d/S50apache /a/etc/rc3.d/s50apache
# mv /a/etc/rcS.d/K16apache /a/etc/rcS.d/k16apache

BugId 4504385

Problem Summary: If you use interactive scinstall(1M), which provides the cluster transport autodiscovery features, you might see the following the error message during the probe:

scrconf:  /dev/clone: No such file or directory

This error message might result in the probe aborting and autodiscovery failing. The device might be a device that is not a network adapter. For example, the device might be /dev/llc20. If you encounter this problem, please ask your service representative to update the bug report with additional information that might be useful in reproducing this problem.

Workaround: Reboot the node, and then retry scinstall. If this does not solve the problem, select the non-autodiscovery options of scinstall.

BugId 4505391

Problem Summary: When you upgrade the Sun Cluster software from Sun Cluster 2.2 to Sun Cluster 3.0 12/01 using the scinstall -u begin -F command, the scinstall command fails to remove patches with dependencies and aborts with the following messages:

scinstall:  Failed to remove patch-id.rev
scinstall:  scinstall did NOT complete successfully!

A patch dependency is the cause of this failure.

Workaround: Manually back out the patch dependencies, then restart the upgrade process. Use the log file to identify the patch dependencies that caused the script to fail. Use can also use the showrev command to identify patch dependencies.

showrev -p | grep patch-id

BugId 4509832

Problem Summary: If a Cluster Configuration Repository (CCR) is invalid in a cluster, it is neither readable nor writable. If you run the ccradm -r -f command on the invalid CCR in question, this command should make the invalid CCR readable as well as writable. However, after you run the ccradm -r -f command the CCR table is still not writable.

Workaround: Reboot the entire cluster.

BugId 4511478

Problem Summary: When interactive scinstall(1M) runs a second time against the same JumpStart directory to set up a JumpStart server for installing a cluster, the cluster and JumpStart directory names might disappear. In the scinstall command-line invoked by this process, both of these names are missing.

Workaround: From your JumpStart directory, remove the .interactive.log.3 file, and then rerun scinstall.

BugId 4515780

Problem Summary: NLS files for Oracle 9.0.1 are not backward compatible for Oracle 8.1.6 and 8.1.7 software. Patch 110651-04 has been declared bad.

Workaround: Back out Patch 110651-04 and replace it with 110651-02.

BugId 4517304

Problem Summary: If syslogd dies and you cannot restart it on a cluster node (for example, as a result of Bugid 4477565), this can cause rgmd to hang on one or more nodes. This in turn causes other commands such as scstat(1M) -g, scswitch(1M) -g, scrgadm(1M), and scha_*_get(1HA,3HA) to hang and prevents resource group failovers from succeeding.

Workaround: Edit the /etc/init.d/syslog script, inserting a line to remove the symbolic link /etc/.syslog_door before the command that starts /usr/sbin/syslogd.Inserted Line:

rm -f /etc/.syslog_door

BugId 4517875

Problem Summary: After the installation of RSM (Remote Shared Memory) packages and the SUNWscrif package (the RSMAPI Path Manager package), some of the paths RSMAPI uses fail to come up to RSM_CONNECTION_ACTIVE state. If you dump the topology structure using rsm_get_interconnect_topology (3rsm), interface {rsmapi.h} shows the state of each path.

Caution -

Perform the following workaround on each path one at a time so that you do not isolate the node from the cluster.

Workaround: Run the following commands on any node of the cluster to bring up the paths that are in a state other than RSM_CONNECTION_ACTIVE (3).

# scconf -c -m endpoint=node:adpname,state=disabled
# scconf -c -m endpoint=node:adpname,state=enabled

node:adpname: An endpoint on the path that is experiencing this problem

BugId 4522648

Problem Summary: As of the VxVM 3.1.1 release, the man-page path has changed to /opt/VRTS/man. In previous releases the man-page path was /opt/VRTSvxvm/man. This new path is not documented in the Sun Cluster 3.0 12/01 Software Installation Guide.

Workaround: For VxVM 3.1.1 and later, add /opt/VRTS/man to the MANPATH on each node of the cluster.