Sun Cluster 3.1 8/05 Release Notes for Solaris OS

Known Issues and Bugs

The following known issues and bugs affect the operation of the Sun Cluster 3.1 8/05 release.

scvxinstall Creates Incorrect vfstab Entries When Boot Device is Multipathed (4639243)

Problem Summary: scvxinstall creates incorrect /etc/vfstab entries when boot device is multipathed.

Workaround: Run scvxinstall and choose to encapsulate. When the following message appears, type Ctrl-C to abort the reboot:


This node will be re-booted in 20 seconds. Type Ctrl-C to abort.

Edit the vfstab entry so /global/.devices uses the /dev/{r}dsk/cXtXdX name instead of the /dev/did/{r}dsk name. This revised entry enables VxVM to recognize it as the root disk. Rerun scvxinstall and choose to encapsulate. The vfstab file has the necessary updates. Allow the reboot to occur. The encapsulation proceeds as normal.

ProcedureHow to Correct /etc/vfstab Errors For a Multipathed Boot Device

Steps
  1. Run scvxinstall and choose to encapsulate.

    The system displays the following message:


    This node will be re-booted in 20 seconds.  Type Ctrl-C to abort.
  2. Abort the reboot.


    Ctrl-C
  3. Edit the /etc/vfstab entry so /global/.devices uses the /dev/{r}dsk/cXtXdX name instead of the /dev/did/{r}dsk name.

    This revised entry enables VxVM to recognize it as the root disk.

  4. Rerun scvxinstall and choose to encapsulate.

    The /etc/vfstab file has the necessary updates. Allow the reboot to occur. The encapsulation proceeds as normal.

SAP liveCache Stop Method Times Out (4836272)

Problem Summary: The Sun Cluster HA for SAP liveCache data service uses the dbmcli command to start and stop liveCache. If you are running Solaris 9, the network service might become unavailable when a cluster node's public network fails.

Workaround: Include one of the following entries for the publickey database in the /etc/nsswitch.conf files on each node that can be the primary for liveCache resources:

publickey: 
publickey:  files
publickey:  files [NOTFOUND=return] nis 
publickey:  files [NOTFOUND=return] nisplus

Adding one of the above entries, in addition to updates documented in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS, ensures that the su command and the dbmcli command do not refer to the NIS/NIS+ name services. Bypassing the NIS/NIS+ name services ensures that the data service starts and stops correctly during a network failure.

nsswitch.conf Requirement Should Not Apply to passwd Database (4904975)

Problem Summary: The requirement for the nsswitch.conf file in Preparing the Nodes and Disks in Sun Cluster Data Service for SAP liveCache Guide for Solaris OS does not apply to the entry for the passwd database. If these requirements are met, the su command might hang on each node that can master the liveCache resource when the public network is down.

Workaround: On each node that can master the liveCache resource, ensure that the entry in the /etc/nsswitch.conf file for the passwd database is as follows:

passwd: files nis [TRYAGAIN=0]

sccheck Hangs (4944192)

Problem Summary: sccheck might hang if launched simultaneously from multiple nodes.

Workaround: Do not launch sccheck from any multi-console that passes commands to multiple nodes. sccheck runs can overlap, but should not be launched simultaneously.

Java Binaries Linked to Incorrect Java Version Cause HADB Agent to Malfunction (4968899)

Problem Summary: Currently, HADB data service does not use the JAVA_HOME environment variable. Therefore, HADB, when invoked from the HADB data service, takes Java binaries from /usr/bin/. The Java binaries in /usr/bin/ need to be linked to the appropriate version of Java 1.4 and above for HADB data service to work properly.

Workaround: If you do not object to changing the default version available, perform the following procedure. As an example, this workaround assumes that the /usr/j2se directory is where you have the latest version of Java (such as 1.4 and above).

  1. If you have a directory called java/ in the /usr/ directory, move it to a temporary location.

  2. From the /usr/ directory, link /usr/bin/java and all other Java-related binaries to the appropriate version of Java.


    # ln -s j2se java
    

If you do not want to change the default version available, assign the JAVA_HOME environment variable with the appropriate version of Java (J2SE 1.4 and above) in the /opt/SUNWappserver7/SUNWhadb/4/bin/hadbm script.

Adding a New Cluster Node Requires Cluster Reboot (4971299)

Problem Summary: When a node is added to the cluster that runs Sun Cluster Support for Oracle Real Application Clusters and uses the VxVM cluster feature, the cluster feature running on other nodes does not recognize the new node.

Workaround: A fix for this problem is expected to be made available by VERITAS in VxVM 3.5 MP4 and VxVM 4.0 MP2. The fix for VxVM 4.1 is currently available.

To correct the problem if a code fix is not yet available, restart the Oracle database and reboot the cluster nodes. This step synchronizes the Oracle UDLM and updates the VxVM cluster feature to recognize the added node.


Note –

Do not install and configure Sun Cluster Support for Oracle Real Application Clusters on the new node until after you perform this step.


  1. From a cluster node other than the node that you just added, shut down the Oracle Real Application Clusters database.

  2. Reboot the same node on which you shut down the Oracle database.


    # scswitch -S -h thisnode
    # shutdown -g0 -y -i6
    

    Wait until the node is fully rebooted back into the cluster before you proceed to the next step.

  3. Restart the Oracle database.

  4. Repeat Step 1 through Step 3 on each remaining node that runs Sun Cluster Support for Oracle Real Application Clusters.

    • If a single node is capable of handling the Oracle database workload, you can perform these steps on multiple nodes simultaneously.

    • If more than one node is required to support the database workload, perform these steps on one node at a time.

HA-DB Reinitializes Without Spares (4973982)

Problem Summary: Due to bug 4974875, whenever autorecovery is performed, the database reinitializes itself without any spares. The mentioned bug has been fixed and integrated into HA-DB release 4.3. For HA-DB 4.2 and below releases, follow one of the procedures below to change the roles of the HA-DB nodes.

Workaround: Complete one of the following procedures to change the roles of the HA-DB nodes:

  1. Identify the HA-DB nodes that have their roles changed after autorecovery is successful.

  2. On all the nodes that you identified in Step 1, and one node at a time, disable the fault monitor for the HA-DB resource in question.


    # cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery
    
  3. Enable the fault monitor for the HA-DB resource in question.

    or

  1. Identify the HA-DB nodes that have their roles changed after autorecovery is successful.

  2. On all nodes that host the database, disable the fault monitor for the HA-DB resource in question.

  3. On any one of the nodes, execute the command for each HA-DB node that needs its role changed.


    # cladm noderole -db dbname -node nodeno -setrole role-before-auto_recovery
    

pnmd Not Accessible by the Other Node During Rolling Upgrade (4997693)

Problem Summary: If rolling upgrade is not completed on all the nodes, the nodes that are not yet upgraded will not be able to see the IPMP groups on the upgraded nodes

Workaround: Finish upgrading all nodes on the cluster.

Date Field on Advanced Filter Panel Accepts Only mm/dd/yyyy Format (5075018)

Problem Summary: The date field on the Advanced Filter panel of SunPlex Manager accepts only mm/dd/yyyy format. However, in non-English locale environments, the date format is different from mm/dd/yyyy; and the return date format from the Calendar panel is other than mm/dd/yyyy format.

Workaround: Type the date range in the Advanced Filer panel in mm/dd/yyyy format. Do not use the Set... button to display the calendar and choose the date.

In the Japanese Locale, Error Messages From scrgadm Contain Junk Characters (5083147)

Problem Summary: In the Japanese locale, the error messages from scrgadm are not displayed correctly. The messages contain junk characters.

Workaround: Run the system locale in English to display the error messages in English.

The /usr/cluster/lib/cmass/ipmpgroupmanager.sh Script Unplumbs the IPv6 Interface (6174170)

Problem Summary: SunPlex Manager uses the /usr/cluster/lib/cmass/ipmpgroupmanager.sh to delete IPMP groups and adapters from IPMP groups. The script updates the /etc/hostname6.adaptername file correctly to just remove the group name, but runs the following ifconfig command to unplumb the IPv6 interface :


ifconfig adaptername inet6 unplumb

Workaround: Reboot the node to plumb up the interface. Alternatively, run the following ifconfig command on the node. This alternative workaround does not require the node to be rebooted.


ifconfig adaptername inet6 plumb up

The IPMP Group Page Should Populate the Adapter List Based on the IP Version Chosen by the User (6174805)

Problem Summary: The list of adapters displayed in the IPMP group pages is not dependent on the IP version chosen by the user. The page displays a list of all adapters that do not have groups configured. The list should be updated when the IP Version radio button is selected as follows :

Workaround: After selecting the IP version, make sure you choose only the adapter from the list which is enabled for the selected IP version.

When Moving an Adapter from IPv4 and IPv6 to a IPv4 Only, the IPv4 Version is Not Removed (6179721)

Problem Summary: The adapter list that is displayed in the IPMP group pages is dependent on the IP version the user chooses. The current SunPlex Manager has a bug that always displays a complete list of adapters regardless of the IP version. SunPlex Manager should not let the user move an adapter which is enabled for both IPv4 and IPv6 to IPv4 only.

Workaround: The user should not attempt to move an adapter configured for both IPv4 and IPv6 to IPv4 only.

Configuration of Sun Java System Administration Server Fails if SUNWasvr Package is Not Installed (6196005)

Problem Summary: An attempt to configure the data service for Sun Java System Administration Server fails if the Sun Java System Administration Server is not installed. The attempt fails because the SUNW.mps resource type requires that the /etc/mps/admin/v5.2/cluster/SUNW.mps directory exists. This directory exists only if the SUNWasvr package is installed.

Workaround: To correct this problem, complete the following procedure.

ProcedureHow to Install the SUNWasvr Package

Steps
  1. Log in as root or assume an equivalent role on a cluster node.

  2. Determine whether the SUNWasvr package is installed.


    # pkginfo SUNWasvr
    
  3. If the SUNWasvr package is not installed, install the package from the Sun Cluster CD-ROM by completing the following step:

    1. Insert the Sun Cluster 2 of 2 CD-ROM into the appropriate drive.

    2. Go to the directory that contains the SUNWasvr package.


      # cd /cdrom/cdrom0/Solaris_sparc/Product/administration_svr/Packages
      
    3. Type the command to install the package.


      # pkgadd -d . SUNWasvr
      
    4. Remove the CD-ROM from the drive.

Change to startd/duration Does Not Become Effective Immediately (6196325)

Problem Summary: As of Solaris 10, the Sun Cluster HA for NFS data service sets the property /startd/duration to transient for the Service Management Facility (SMF) services /network/nfs/server, /network/nfs/status, and /network/nfs/nlockmgr. The intention of this property setting is to cause SMF not to restart these services in the event of any failure. A bug in SMF causes SMF to restart /network/nfs/status and /network/nfs/nlockmgr after the first failure despite this property setting.

Workaround: For Sun Cluster HA for NFS to run correctly, run the following command on all nodes after creating the first Sun Cluster HA for NFS resource and before bringing the Sun Cluster HA for NFS resource online.


# pkill -9 -x 'startd|lockd'

If you are booting Sun Cluster for the first time, run the above command on all the potential primary nodes, after creating the first Sun Cluster HA for NFS resource and before bringing the Sun Cluster HA for NFS resource online.

scinstall Does Not Copy All Common Agent Container Security Files (6203133)

Problem Summary: When a node is added to a cluster, the scinstall utility checks for the presence of Network Security Services (NSS) files on the node that you are adding. These files and security keys are required by the common agent container. If the NSS files exist, the utility copies the common agent container security files from the sponsoring node to the added node. But if the sponsoring node does not have the NSS security keys installed, the copy fails and scinstall processing quits.

Workaround: Perform the following procedure to install NSS software, recreate the security keys, and restart the common agent container on the existing cluster nodes.

ProcedureHow to Install NSS Software When Adding a Node to a Cluster

Perform the following procedure on all existing cluster nodes as superuser or a role that permits the appropriate access.

Before You Begin

Have available the Sun Cluster 1 of 2 CD-ROM. The NSS packages are located at /cdrom/cdrom0/Solaris_arch/Product/shared_components/Packages/, where arch is sparc or x86 and where ver is 8 for Solaris 8, 9 for Solaris 9, or 10 for Solaris 10.

Steps
  1. On each node, stop the Sun Web Console agent.


    # /usr/sbin/smcwebserver stop
    
  2. On each node, stop the security file agent.


    # /opt/SUNWcacao/bin/cacaoadm stop
    
  3. On each node, determine whether NSS packages are installed and, if so, what version.


    # cat /var/sadm/pkg/SUNWtls/pkginfo | grep SUNW_PRODVERS
    SUNW_PRODVERS=3.9.4
  4. If a version earlier than 3.9.4 is installed, remove the existing NSS packages.


    # pkgrm packages
    

    The following table lists the applicable packages for each hardware platform.

    Hardware Platform 

    NSS Package Names 

    SPARC 

    SUNWtls SUNWtlsu SUNWtlsx

    x86 

    SUNWtls SUNWtlsu

  5. On each node, if you removed NSS packages or none were installed, install the latest NSS packages from the Sun Cluster 1 of 2 CD-ROM.

    • For the Solaris 8 or Solaris 9 OS, use the following command:


      # pkgadd -d . packages
      
    • For the Solaris 10 OS, use the following command:


      # pkgadd -G -d . packages
      
  6. Change to a directory that does not reside on the CD-ROM and eject the CD-ROM.


    # eject cdrom
    
  7. On each node, create the NSS security keys.


    # /opt/SUNWcacao/bin/cacaoadm create-keys
    
  8. On each node, start the security file agent.


    # /opt/SUNWcacao/bin/cacaoadm start
    
  9. On each node, start the Sun Web Console agent.


    # /usr/sbin/smcwebserver start
    
  10. On the node that you are adding to the cluster, restart the scinstall utility and follow procedures to install the new node.

Deleting a Public Interface Group Which has IPv4 and IPv6 Adapters Sometimes Fails From SunPlex Manager (6209229)

Problem Summary: Deleting a public interface group which has both IPv4 and IPv6 enabled adapters sometimes fails when trying to delete the IPv6 adapter from the group. The following error message is displayed :


ifparse: Operation netmask not supported for inet6
/sbin/ifparse
/usr/cluster/lib/cmass/ipmpgroupmanager.sh[8]:
/etc/hostname.adaptname.tmpnumber: cannot open

Workaround: Edit the/etc/hostname6.adaptername file to include the following lines:


plumb
up
-standby

Run the following command on the cluster node :


ifconfig adaptername inet6 plumb up -standby

Memory Leak During Rebooting Patch (Node) Procedure (Bug 6210440)

Problem Summary: Sun Cluster software hangs when attempting to perform a rolling upgrade from Sun Cluster 3.1 9/04 software to Sun Cluster 3.1 8/05 software due to a memory problem triggered when the first upgraded node is rebooted in cluster mode.

Workaround: If you are running Sun Cluster 3.1 9/04 software or the patch equivalent (revision 09 or higher) and want to perform a Rebooting Patch procedure to upgrade to Sun Cluster 3.1 8/05 software or the patch equivalent (revision 12), you must complete the following steps before you upgrade your cluster or apply this core patch.

ProcedureHow to Prepare for an Upgrade to Sun Cluster 3.1 8/05 Software

Steps
  1. Choose the type of patch installation procedure that is appropriate to your availability requirements:

    • Rebooting Patch (Node)

    • Rebooting Patch (Cluster and Firmware)

    These patch installation procedures are provided in Chapter 8, Patching Sun Cluster Software and Firmware, in Sun Cluster System Administration Guide for Solaris OS.

  2. Apply one of the following patches depending on the operating system you are using:

    • 117909-11 Sun Cluster 3.1 Core Patch for SunOS 5.9 X86

    • 117950-11 Sun Cluster 3.1 Core Patch for Solaris 8

    • 117949-11 Sun Cluster 3.1 Core Patch for Solaris 9

    You must complete the entire patch installation procedure before upgrading to Sun Cluster 3.1 8/05 software or the patch equivalent (revision 12).

Zone Install and Zone Boot Does Not Work After Sun Cluster Install (6211453)

Problem Summary: Sun Cluster software installation adds exclude: lofs to /etc/system. Because lofs is critical to the function of zones, both zone install and zone boot fail.

Workaround: Before attempting to create any zones, perform the following procedure.

ProcedureHow to Run Zone Install and Zone Boot After a Sun Cluster Installation

Steps
  1. If you are running Sun Cluster HA for NFS, exclude from the automounter map all files that are part of the highly available local file system that is exported by the NFS server.

  2. On each cluster node, edit the /etc/system file to remove any exclude: lofs lines.

  3. Reboot the cluster.

Solaris 10 Requires Additional Steps to Recover From the Failure of a Cluster File System to Mount at Boot Time (6211485)

Problem Summary: The Solaris 10 OS requires different recovery procedures than previous versions of the Solaris OS when a cluster file system fails to mount at boot time. Rather than present a login prompt, the mountgfsys service might fail and put the node into the maintenance state. The output messages are similar to the following:


WARNING - Unable to globally mount all filesystems.
Check logs for error messages and correct the problems.
 
May 18 14:06:58 pkaffa1 svc.startd[8]: system/cluster/mountgfsys:default misconfigured
 
May 18 14:06:59 pkaffa1 Cluster.CCR: /usr/cluster/bin/scgdevs: 
Filesystem /global/.devices/node@1 is not available in /etc/mnttab.

Workaround: After you repair the mount problem for the cluster file system, you must manually bring the mountgfsys service back online. Run the following commands to bring the mountgfsys service online and to synchronize the global devices namespace:


# svcadm clear svc:/system/cluster/mountgfsys:default
# svcadm clear svc:/system/cluster/gdevsync:default

Boot processing will now continue.

Unsupported Upgrade to the Solaris 10 OS Corrupts the /etc/path_to_inst File (6216447)

Problem Summary: Sun Cluster 3.1 8/05 software does not support upgrade to the March 2005 release of the Solaris 10 OS. An attempt to upgrade to that release might corrupt the /etc/path_to_inst file. This file corruption would prevent the node from booting successfully. The corrupted file would appear similar to the following, in that it contains duplicate entries for some of the same device names except that the physical device name contains the prefix /node@nodeid:


…
"/node@nodeid/physical_device_name" instance_number "driver_binding_name"
…
"/physical_device_name" instance_number "driver_binding_name"

In addition, some key Solaris services might fail to start, including networking and file-system mounting, and messages might print on the console which state that the service is misconfigured.

Workaround: Use the following procedure.

ProcedureHow to Recover From a Corrupted /etc/path_to_inst File

The following procedure describes how to recover from an upgrade to Solaris 10 software that results in a corrupted /etc/path_to_inst file.


Note –

This procedure does not attempt to correct any other problem that can be associated with upgrading a Sun Cluster configuration to the March 2005 release of the Solaris 10 OS.


Perform this procedure on each node that was upgraded to the March 2005 release of the Solaris 10 OS.

Before You Begin

If a node cannot boot, boot the node from the network or from a CD-ROM. Once the node is up, run the fsck command and mount the local file system in a partition such as /a. In Step 2, use the name of the local-file-system mount in the path to the /etc directory.

Steps
  1. Become superuser or an equivalent role on the node.

  2. Change to the /etc directory.


    # cd /etc
    
  3. Determine whether the path_to_inst file is corrupted.

    The following characteristics are present if the path_to_inst file is corrupted:

    • The file includes a block of entries that contain /node@nodeid at the beginning of physical device names.

    • Some of the same entries are listed again but without the /node@nodeid prefix.

    If the file is not of this format, then some other problem exists. Do not continue this procedure. Contact your Sun service representative if you need assistance.

  4. If the path_to_inst file is corrupted as described in Step 3, run the following commands.


    # cp path_to_inst path_to_inst.bak
    # sed -n -e "/^#/p" -e "s,node@./,,p" path_to_inst.bak > path_to_inst
    
  5. Inspect the path_to_inst file to ensure that the file is repaired.

    A repaired file will reflect the following changes:

    • The /node@nodeid prefix is removed from all physical device names.

    • There are no duplicate entries for any physical device name.

  6. Ensure that the permissions of the path_to_inst file are read only.


    # ls -l /etc/path_to_inst
    -r--r--r--   1 root     root        2946 Aug  8  2005 path_to_inst
  7. Perform a reconfiguration reboot into non-cluster mode.


    # reboot -- -rx
    
  8. After you repair all affected cluster nodes, go to How to Upgrade Dependency Software Before a Nonrolling Upgrade in Sun Cluster Software Installation Guide for Solaris OS to continue the upgrade process.

CMM Reconfiguration Callback Timed Out; Node Aborting (6217017)

Problem Summary: On x86 clusters with ce transports, a node under heavy load could be halted by CMM as a result of a split-brain.

Workaround: For x86 clusters using the PCI Gigaswift Ethernet card on the private network, add the following to /etc/system:


set ce:ce_tx_ring_size=8192

Nodes Might Panic When a Node Joins or Leaves a Cluster With More Than Two Nodes, Running Solaris 10, and Using Hitachi Storage (6227074)

Problem Summary: On clusters with more than two nodes, running Solaris 10, and using Hitachi storage, all of the cluster nodes might panic when a node joins or leaves the cluster.

Workaround: No current workaround exists. If you encounter this problem, contact your Sun Service provider about acquiring a patch.

Java ES 2005Q1 installer Does Not Install Application Server 8.1 EE Completely (6229510)

Problem Summary: Application Server Enterprise Edition 8.1 cannot be installed by the Java ES 2005Q1 installer if the Configure Later option is selected. Selecting the Configure Later option installs the Platform Edition and not the Enterprise Edition.

Workaround: While installing the Application Server Enterprise Edition 8.1 using the Java ES installer, use the Configure Now option to install. Selecting the Configure Later option installs the Platform Edition only.

scvxinstall Causes rpcbind to Restart (6237044)

Problem Summary: Restart of the bind SMF service can impact Solaris Volume Manager operation. Installation of Veritas 4.1 VxVM packages causes the SMF bind service to be restarted.

Workaround: Reboot Solaris Volume Manager after either restarting the bind SMF service or after installing VxVM 4.1 on a S10 host.


svcadm restart svc:/network/rpc/scadmd:default

On a System Using Solaris 10, Sun Cluster Data Services Cannot be Installed After Sun Cluster is Installed Using the Java ES installer (6237159)

Problem Summary: This problem occurs only on systems using Solaris 10. If the user uses the Java ES installer on the Sun Cluster Agents CD-ROM to install Sun Cluster data services after the Sun Cluster core has been installed, the installer fails with the following messages :


The installer has determined that you must manually remove incompatible versions 
of the following components before proceeding: 

[Sun Cluster 3.1 8/05, Sun Cluster 3.1 8/05, Sun Cluster 3.1 8/05]

After you remove these components, go back. 
Component                       Required By ...

1. Sun Cluster 3.1 8/05     HA Sun Java System Message Queue : HA Sun Java 
                            System Message Queue 
2. Sun Cluster 3.1 8/05     HA Sun Java System Application Server : HA Sun Java 
									System Application Server 
3. Sun Cluster 3.1 8/05     HA/Scalable Sun Java System Web Server : HA/Scalable 
									Sun Java System Web Server 
4. Select this option to go back to the component list. This process might take
									a few moments while the installer rechecks your
									system for installed components.

Select a component to see the details. Press 4 to go back the product list
[4] {"<" goes back, "!" exits}

Workaround: On a system using Solaris 10, install the Sun Cluster data service manually by using pkgadd or scinstall. If the Sun Cluster data service has a dependency on shared components, install the shared components manually by using pkgadd. The following link lists the shared components for each product:

http://docs.sun.com/source/819-0062/preparing.html#wp28178

/usr/sbin/smcwebserver: ... j2se/opt/javahelp/lib: does not exist Error Message (6238302)

Problem Summary: During startup of Sun Web Console, the following message might be displayed.


/usr/sbin/smcwebserver:../../../../j2se/opt/javahelp/lib: does not exist

Workaround: The message is safe to ignore. You can manually add a link in /usr/j2se/opt to point to the correct Java Help 2.0 by entering the following:


# ln -s /usr/jdk/packages/javax.help-2.0 /usr/j2se/opt/javahelp

Node Panic After OS Upgrade to Solaris 10 From Sun Cluster 3.1 4/04 on Solaris 9 (6245238)

Problem Summary: After upgrading from the Solaris 9 OS to the Solaris 10 OS on a cluster that runs Sun Cluster 3.1 4/04 software or earlier, booting the node into noncluster mode results in the node panicking.

Workaround: Install one of the following patches before you upgrade from Solaris 9 to Solaris 10 software.

SunPlex Installer is Not Creating Resources in Resource Groups (6250327)

Problem Summary: When using SunPlex Installer to configure Sun Cluster HA for Apache and Sun Cluster HA for NFS data services as part of Sun Cluster installation, SunPlex Installer does not create the necessary device groups and resources in the resource groups.

Workaround: Do not use SunPlex Installer to install and configure data services. Instead, follow procedures in the Sun Cluster Software Installation Guide for Solaris OS and the Sun Cluster Data Service for Apache Guide for Solaris OS or Sun Cluster Data Service for NFS Guide for Solaris OS manuals to install and configure these data services.

HA-NFS Changes to Support NFSv4 Fix for 6244819 (6251676)

Problem Summary: NFSv4 is not supported in Sun Cluster 3.1 8/05.

Workaround: Solaris 10 introduces a new version of NFS protocol, NFSv4. This is the default protocol for Solaris 10 clients and server. The Sun Cluster 3.1 8/05 release supports Solaris 10, however it does not support use of NFSv4 protocol with Sun Cluster HA for NFS service on the cluster to achieve high-availability for NFS server. To make sure no NFS client can use NFSv4 protocol to talk to NFS server on Sun Cluster software, edit the /etc/default/nfs file to change the line NFS_SERVER_VERSMAX=4 to NFS_SERVER_VERSMAX=3. This would make sure that only NFSv3 protocol is used by the clients of Sun Cluster HA for NFS service on the cluster.

NOTE: Use of Solaris 10 cluster nodes as NFSv4 clients is not affected by this restriction and the above mentioned workaround. The cluster nodes can act as NFSv4 clients.

metaset Command Fails After the rpcbind Service is Restarted (6252216)

Problem Summary: The metaset command fails after the rpcbind service is restarted.

Workaround: Ensure that you are not performing any configuration operations on your Sun Cluster system, then kill the rpc.metad process using the following command:


# pkill -9 rpc.metad

Node Panic Due to metaclust Return Step Error: RPC: Program not Registered (6256220)

Problem Summary: When shutting down the cluster, some of the nodes may panic due to the order in which services are stopped on the nodes. If the RPC service is stopped before the RAC framework is stopped, errors may result when the SVM resource attempts to reconfigure. This results in an error being reported back to the RAC framework resulting in a node panic. This problem has been observed with Sun Cluster running the RAC framework with the SVM storage option. There should be no impact to Sun Cluster functionality.

Workaround: The panic is by design and can safely be ignored, although clean-up of the saved core files should be performed to reclaim filesystem space.

NIS Address Resolution Hangs and Causes Failure to Fail Over (6257112)

Problem Summary: In the Solaris 10 OS, the /etc/nsswitch.conf file has been modified to include NIS in the ipnodes entry.


ipnodes:    files nis [NOTFOUND=return]

This causes the address resolution to hang if NIS becomes inaccessible, either due to a NIS problem or due to failure of all public network adapters. This problem eventually causes failover resources or shared address resources to fail to fail over.

Workaround: Complete the following before you create logical host or shared address resources:

  1. Change the ipnodes entry in the /etc/nsswitch.conf file from [NOTFOUND=return] to [TRYAGAIN=0].


    ipnodes:    files nis [TRYAGAIN=0]
  2. Ensure that all IP addresses for logical hosts and shared addresses are added to the /etc/inet/ipnodes file, in addition to the /etc/inet/hosts file.

scinstall Fails to Upgrade the Sun Cluster Data Service for Sun Java System Application Server EE (6263451)

Problem Summary: While attempting to update the Sun Cluster Data Service for Sun Java System Application Server EE from 3.1 9/04 to 3.1 8/05, scinstall does not remove the package for j2ee and displays the following message:


Skipping "SUNWscswa" - already installed

Sun Cluster Data Service for Sun Java System Application Server EE is not upgraded.

Workaround: Manually remove and add the sap_j2ee package using the following commands :


# # pkgrm SUNWscswa
# pkgadd [-d device] SUNWscswa

scnas: NAS Filesystem did not get Mounted During Bootup (6268260)

Problem Summary: The NFS file system cannot be checked for viability prior to a failover or scswitch being used to locate the data service to the node. If a node doesn't have the NFS filesystem, a switch/failover to that node will result in a failure of the data service that requires manual intervention. A mechanism like HAStoragePlus is needed to check the viability of the filesystem prior to attempting the fail/switchover to that node.

Workaround: File systems using NAS filers (with entries in /etc/vfstab) are mounted outside Sun Cluster software control, and this means that Sun Cluster software is unaware of any problems. Should the file system become unavailable, some data services, such as Sun Cluster HA for Oracle, will fail when data service methods, such as START or STOP, are executed.

Failure of these methods may lead to several possibilities:

Perform one of the following procedures to avoid the above problems:

HADB Fault Monitor Will Not Restart the ma Process (6269813)

Problem Summary: The Sun Cluster data service does not restart the ma process when the data service is killed or exits abruptly.

Workaround: This is the expected behavior and the data services is not affected.

rgmd Dumps Core During Rolling Upgrade (6271037)

Problem Summary: Attempting to delete a resource during a rolling upgrade before all nodes are running the new software might cause one of the nodes to panic. Do not delete a resource until all nodes have the new software installed.

Workaround: During a rolling upgrade, do not delete an RGM resource until all nodes have the new software installed.

HADB Database Fails to Restart After Shut Down and Boot of Cluster (6276868)

Problem Summary: The HADB database fails to restart after the cluster nodes are rebooted. The user will not be able to access the database.

Workaround: Restart one of your management data services by completing the following procedure. If the following procedure does not resolve the problem, delete the database and recreate it.

ProcedureRestarting a Management Data Service

Steps
  1. On the node to be shut down, type the following command. The -h option should not include the node name on which you want the management agent to be stopped.


    scswitch -z -g hadb resource grp -h node1, node2...
    
  2. Switch the resource group back to the original node.


    scswitch —Z —g hadb resource grp
    
  3. Check the status of the database. Wait until the database comes to the “stopped” state.


    hadbm status -n database
    
  4. Start the database.


    hadbm start database
    

SUNW.iim Has Size 0 After Adding SUNWiimsc Package (6277593)

Problem Summary: The SUNWiimsc package in sun_cluster_agents is invalid. After adding this package, SUNW.iim in /opt/SUNWiim/cluster has size 0.

Workaround: Replace the SUNW.iim package and register again by completing the following steps.

ProcedureHow to Install the Correct SUNW.iim Package

Steps
  1. Copy the correct SUNW.iim from the CD-ROM.


    # cp 2of2_CD/Solaris_arch/Product/sun_cluster_agents/Solaris_os
    /Packages/SUNWiimsc/reloc/SUNWiim/cluster/SUNW.iim /opt/SUNWiim/Cluster/SUNW.iim
    
  2. Remove any existing SUNW.iim registration.


    # rm /usr/cluster/lib/rgm/rtreg/SUNW.iim
    
  3. Register the data service with Sun Cluster


    sh 2of2_CD/Solaris_arch/Product/sun_cluster_agents/
    Solaris_os/Packages/SUNWiimsc/install/postinstall

Adding a New IPMP Group Through SunPlex Manager Sometimes Fails (6278059)

Problem Summary: Trying to add a new IPMP group using SunPlex Manger sometimes fails with the following message.


An error was encountered by the system. If you were performing an action 
when this occurred, review the current system state prior to proceeding.

Workaround: Perform one of the following procedures depending on the version of IP you are running.

ProcedureAdding a New IPMP Group Through SunPlex Manager When You are Using IPv4

Steps
  1. Enter the following command:


    ifconfig interface inet plumb group groupname [addif address deprecated] 
    netmask + broadcast + up -failover
    
  2. If a test address has been provided, update the /etc/hostname .interface file to add the following:


    group groupname addif address netmask + broadcast + deprecated -failover up
  3. If a test address has not been provided, update the /etc/hostname.interface file to add the following:


    group.groupname netmask + broadcast -failover up

ProcedureAdding a New IPMP Group Through SunPlex Manager When You are Using IPv6

Steps
  1. Entering the following command:


    ifconfig interface inet6 plumb up group groupname
    
  2. Update the /etc/hostname6.interface file to add the following entries:


    group groupname plumb up
  3. If the /etc/hostname6.interface file does not already exist, create the file and add the entries mentioned above.

HADB Resource Keeps Restarting After Panicking One of the Cluster Nodes (6278435)

Problem Summary: After bringing the resource online and panicking one of the nodes in the cluster (for example, shutdown or uadmin), the resource keeps restarting on the other nodes. The user will not be able to issue any management commands.

Workaround: To prevent this problem, log onto a single node as root or a role with equivalent access privileges and increase the probe_timeout of the resource to a value of 600 seconds, using the following command:


scrgadm -c -j hadb resource -x Probe_timeout=600

To verify your change, shutdown one of the cluster nodes and check to make sure the resource does not go into the degraded state.

On Solaris 10, Scalable Services do not Work When Both the Public Networks and Sun Cluster Transports use bge(7D)-driven Adapters (6278520)

Problem Summary: The load balancing feature of Sun Cluster scalable services does not work on Solaris 10 systems when both the public networks and Sun Cluster transports use bge-driven adapters. Platforms with built-in NICs that use bge include Sun Fire V210, V240, and V250.

Failover data services are not affected by this bug.

Workaround: Do not configure public networking and cluster transports to both use bge-driven adapters.

Cannot See the System Log from SunPlex Manager When the Default Locale is set to Multibyte Locale (6281445)

Problem Summary: When the SunPlex Manager default locale is set to multibyte locale, you cannot see the system log.

Workaround: Set the default locale to C or view the syslog (/var/adm/messages) manually through a command line shell

Cannot Bring Node Agent Online Using scswitch on Node1 (6283646)

Problem Summary: The instances and node agents must be configured to listen on the failover IP address/hostname. When the node agents and Sun Java System Application Server instances are created, the physical node hostname is set by default. The HTTP IP Address and the client-hostname is changed in the domain.xml. But Domain Admin Server is not restarted so the changes do not take effect. Therefore, the node agents come up only on the physical node where they were configured, but not on the other node.

Workaround: Change the client-hostname property in the Node Agent section of domain.xml to listen on the failover IP and restart the Domain Admin Server for the changes to take effect.

SunPlex Manager and Cacao 1.1 Only Support JDK 1.5.0_03 (6288183)

Problem Summary: When using SunPlex Manager in Sun Cluster 3.1 8/05 with Cacao 1.1, only JDK 1.5.0_03 is supported.

Workaround: Manually install JDK 1.5 by completing the following procedure.

ProcedureHow to Manually Install JDK 1.5

Steps
  1. Add JDK 1.5 from JES 4 shared components directory (See JES 4 RN for instructions).

  2. Stop cacao.


    # /opt/SUNWcacao/bin/cacaoadm stop
    
  3. Start cacao.


    # /opt/SUNWcacao/bin/cacaoadm start
    

After Installing SC3.1 (8/05) Patch 117949–14 on Solaris 9 and Patch 117950–14 on Solaris 8 Java VM Errors Occur During Boot (6291206)

Problem Summary: This bug is seen on a Sun Cluster system running 3.1 (9/04) plus patches that is upgraded to Sun Cluster (8/05) by applying patch 117949-14 on a system running Solaris 9 or patch 117950-14 on a system running Solaris 8. The following error message displays once the machine boots :


# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xfaa90a88, pid=3102, tid=1
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_01-b07 mixed mode, sharing)
# Problematic frame:
# C  [libcmas_common.so+0xa88]  newStringArray+0x70
#
# An error report file with more information is saved as /tmp/hs_err_pid3102.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Workaround: When upgrading from Sun Cluster 3.1 (9/04) to Sun Cluster 3.1 (8/05), install the SPM patch in addition to the core patch by entering the following command.

On a system running Solaris 8, run the following command after applying core patch 117950-14:


 patchadd patchdir/118626-04

On a system running Solaris 9, run the following command after patch 117949-14 has been applied:


patchadd patchdir/118627-04 

Directory Server and Administration Server Resource Registration Sometimes Fails (6298187)

Problem Summary: The resource registration sometimes fails for Directory Server and Administration Server. The system will display the following message:


Registration file not found for "SUNW.mps" in /usr/cluster/lib/rgm/rtreg

Workaround: Register the missing file from the pkg location directly by entering one of the following commands:

Solaris 10 Cluster Nodes May Fail to Communicate With Machines That Have Both IPv4 and IPv6 Address Mappings (6306113)

Problem Summary: If a Sun Cluster node running Solaris 10 does not have IPv6 interfaces configured for public networking (for example, not for cluster interconnects), it cannot access machines that have both an IPv4 and IPv6 address mapping in a name service, such as NIS. Applications such as telnet and traceroot that choose the IPv6 address over IPv4 will see their packets getting sent to the cluster transport adaptors and dropped.

Workaround: Use one of the following workarounds depending on the configuration or your cluster.