Sun Cluster 3.2 Release Notes for Solaris OS

Known Issues and Bugs

The following known issues and bugs affect the operation of the Sun Cluster 3.2 release. Bugs and issues are grouped into the following categories:

Administration

The clnode remove -f Option Fails to Remove the Node with the Solaris Volume Manager Device Group (6471834)

Problem Summary: The -clnode remove --force command should remove nodes from the metasets. The Sun Cluster System Administration Guide for Solaris OS provides procedures for removing a node from the cluster. These procedures instruct the user to run the metaset command for the Solaris Volume Manager disk set removal prior to running clnode remove.

Workaround: If the procedures were not followed, it might be necessary to clear the stale node data from the CCR in the usual way: From an active cluster node, use the metaset command to clear the node from the Solaris Volume Manager disk sets. Then run clnode clear --force obsolete_nodename.

scsnapshot is Nonfunctional With Solaris 10 SUNWCluster Meta Cluster (6477905)

Problem Summary: On a cluster installed with the Solaris 10 End User software group, SUNWCuser, running the scsnapshot command might fail with the following error:


# scsnapshot -o
…
/usr/cluster/bin/scsnapshot[228]: /usr/perl5/5.6.1/bin/perl:  not found

Workaround: Do either of the following:

Entries in the Auxnodelist Property Causes SEGV During Scalable Resource Creation (6494243)

Problem Summary: The Auxnodelist property of the shared-address resource cannot be used during shared-address resource creation. This will cause validation errors and SEGV when the scalable resource that depends on this shared address network resource is created. The scalable resource's validate error message is in the following format:


Method methodname (scalable svc) on resource resourcename stopped or terminated 
due to receipt of signal 11

Also, the core file is generated from ssm_wrapper. Users will not be able to set the Auxnodelist property and thus cannot identify the cluster nodes that can host the shared address but never serve as primary.

Workaround: On one node, re-create the shared-address resource without specifying the Auxnodelist property. Then rerun the scalable-resource creation command and use the shared-address resource that you re-created as the network resource.

clquorumserver Start and Stop Commands Should Set the Startup State Properly for Next Boot (6496008)

Problem Summary: The Quorum Server command clquorumserver does not set the state for the startup mechanism correctly for the next reboot.

Workaround: Perform the following tasks to start or stop Quorum Server software.

ProcedureHow To Start Quorum Server Software on the Solaris 10 OS

  1. Display the status of the quorumserver service.


    # svcs -a | grep quorumserver
    

    If the service is disabled, output appears similar to the following:


    disabled        3:33:45 svc:/system/cluster/quorumserver:default
  2. Start Quorum Server software.

    • If the quorumserver service is disabled, use the svcadm enable command.


      # svcadm enable svc:/system/cluster/quorumserver:default
      
    • If the quorumserver service is online, use the clquorumserver command.


      # clquorumserver start +
      

ProcedureHow to Stop Quorum Server Software on the Solaris 10 OS

    Disable the quorumserver service.


    # svcadm disable svc:/system/cluster/quorumserver:default
    

ProcedureHow To Start Quorum Server Software on the Solaris 9 OS

  1. Start Quorum Server software.


    # clquorumserver start +
    
  2. Rename the /etc/rc2.d/.S99quorumserver file as /etc/rc2.d/S99quorumserver.


    # mv /etc/rc2.d/.S99quorumserver /etc/rc2.d/S99quorumserver
    

ProcedureHow To Stop Quorum Server Software on the Solaris 9 OS

  1. Stop Quorum Server software.


    # clquorumserver stop +
    
  2. Start Quorum Server software.


    # mv /etc/rc2.d/S99quorumserver /etc/rc2.d/.S99quorumserver
    

Data Services

Creation of Node Agent Resource for Sun Cluster HA for Sun Java Systems Application Server Succeeds Even if Resource Dependency is Not Set on Domain Administration Server (DAS) Resource (6262459)

Problem Summary: When creating the node agent (NA) resource in Sun Cluster HA for Application Server, the resource gets created even if there is no dependency set on the DAS resource. The command should error out if the dependency is not set, because a DAS resource must be online in order to start the NA resource.

Workaround: While creating the NA resource, make sure you set a resource dependency on the DAS resource.

New Variable in HA MySQL Patch Must be Configured for All New Instances (6516322)

Problem Summary: The HA MySQL patch adds a new variable called MYSQL_DATADIR in the mysql_config file. This new variable must point to the directory where the MySQL configuration file my.conf file is stored. If this variable is not configured correctly, the database preparation with mysql_register will fail.

Workaround: Point the MYSQL_DATADIR variable to the directory where the MySQL configuration file, my.conf is stored.

Installation

Autodiscovery With InfiniBand Configurations Can Sometimes Suggest Two Paths Using the Same Adapter (6299097)

Problem Summary: If InfiniBand is used as the cluster transport and there are two adapters on each node with two ports per adapter and a total of two switches, the scinstall utility's adapter autodiscovery could suggest two transport paths that use the same adapter.

Workaround: Manually specify the transport adapters on each node.

IPv6 Scalable Service Support is Not Enabled by Default (6332656)

Problem Summary: IPv6 plumbing on the interconnects, which is required for forwarding of IPv6 scalable service packets, will no longer be enabled by default. The IPv6 interfaces, as seen when using the ifconfig command, will no longer be plumbed on the interconnect adapters by default.

Workaround: Manually enable IPv6 scalable service support.

ProcedureHow to Manually Enable IPv6 Scalable Service Support

Before You Begin

Ensure that you have prepared all cluster nodes to run IPv6 services. These tasks include proper configuration of network interfaces, server/client application software, name services, and routing infrastructure. Failure to do so might result in unexpected failures of network applications. For more information, see your Solaris system-administration documentation for IPv6 services.

  1. On each node, add the following entry to the /etc/system file.


    set cl_comm:ifk_disable_v6=0
    
  2. On each node, enable IPv6 plumbing on the interconnect adapters.


    # /usr/cluster/lib/sc/config_ipv6
    

    The config_ipv6 utility brings up an IPv6 interface on all cluster interconnect adapters that have a link-local address. The utility enables proper forwarding of IPv6 scalable service packets over the interconnects.

    Alternately, you can reboot each cluster node to activate the configuration change.

clnode add Fails to Add a Node from an XML File if the File Contains Direct-Connect Transport Information (6485249)

Problem Summary: If the clnode add command is attempted using an XML file that is using direct-connect transport, the command misinterprets the cable information and adds the wrong configuration information. As a result, the joining node is not able to join the cluster.

Workaround: Use the scinstall command to add a node to the cluster when the cluster transport is directly connected.

The /etc/nsswitch.conf File is Not Updated with host and netmasks Database Information During Non-Global Zone Installation (6345227)

Problem Summary: The scinstall command updates the /etc/nsswitch.conf file to add the cluster entry for the hosts and netmasks databases. This change updates the /net/nsswitch.conf file for the global zone. But when a non-global zone is created and installed, the non-global zone receives its own copy of the /etc/nsswitch.conf file. The /etc/nsswitch.conf files on the non-global zones will not have the cluster entry for the hosts and netmasks databases. Any attempt to resolve cluster-specific private hostnames and IP addresses from within a non-global zone by using getXbyY queries will fail.

Workaround: Manually update the /etc/nsswitch.conf file for non-global zones with the cluster entry for the hosts and netmasks database. This ensures that the cluster-specific private-hostname and IP-address resolutions are available within non-global zones.

Localization

Translated Messages for Quorum Server are Delivered as Part of the Core Translation Packages (6482813)

Problem Summary: Translated messages for the Quorum Server administration programs, such as clquorumserver, are delivered as part of the core translation packages. As a result, Quorum Server messages appear only in English. The Quorum server translation packages must be separated from the core translation packages and installed on the quorum server system.

Workaround: Install the following packages on the host where Quorum Server software is installed:

If the Japanese man page is needed on the quorum server, install the SUNWjscman (Japanese man page) package.

Installer Displays Incorrect Swap Size for the Sun Cluster 3.2 Simplified Chinese Version (6495984)

Problem Summary: The Sun Cluster 3.2 installer displays a warning message about short swap when installing the Sun Cluster 3.2 Simplified Chinese version of the software. The installer provides an incorrect swap size of 0.0KB size on the system requirements check screen.

Workaround: If the swap size is larger than the system requirement, you can safely ignore this problem. The SC 3.2 installer on the C or English locale can be used for installation and this version checks swap size correctly.

Runtime

SAP cleanipc Binary Needs User_env Parameter for LD_LIBRARY_PATH (4996643)

Problem Summary: The cleanipc fails if the runtime linking environment does not contain the /sapmnt/SAPSID/exe path.

Workaround: As the Solaris root user, add the /sapmnt/SAPSID/exe path to the default library in the ld.config file.

To configure the runtime linking environment default library path for 32–bit applications, enter the following command:


# crle -u -l /sapmnt/SAPSID/exe

To configure the runtime linking environment default library path for 64–bit applications, enter the following command:


# crle -64 -u -l /sapmnt/SAPSID/exe

Node Panics Due to a metaclust Return Step Error: RPC: Program Not Registered (6256220)

Problem Summary: When a cluster shutdown is performed, the UCMMD can go into a reconfiguration on one or more of the nodes if one of the nodes leaves the cluster slightly ahead of the UCMMD. When this occurs, the shutdown stops the rpc.md command on the node while the UCMMD is trying to perform the return step. In the return step, the metaclust command gets an RPC timeout and exits the step with an error, due to the missing rpc.mdcommd process. This error causes the UCMMD to abort the node, which might cause the node to panic.

Workaround: You can safely ignore this problem. When the node boots back up, Sun Cluster software detects this condition and allows the UCMMD to start, despite the fact that an error occurred in the previous reconfiguration.

Sun Cluster Resource Validation Does Not Accept the Hostname for IPMP Groups for the netiflist Property (6383994)

Problem Summary: Sun Cluster resource validation does not accept the hostname for IPMP groups for the netiflist property during logical-hostname or shared-address resource creation.

Workaround: Use the node ID instead of the node name when you specify the IPMP group names during logical-hostname and shared-address resource creation.

Upgrade

The vxlufinish Script Returns an Error When the Root Disk is Encapsulated (6448341)

Problem Summary: This problem is seen when the original disk is root encapsulated and a live upgrade is attempted from VxVM 3.5 on Solaris 9 8/03 OS to VxVM 5.0 on Solaris 10 6/06 OS. The vxlufinish script fails with the following error.


#./vslufinish -u 5.10

    VERITAS Volume Manager VxVM 5.0
    Live Upgrade finish on the Solairs release <5.10>

    Enter the name of the alternate root diskgroup: altrootdg
ld.so.1: vxparms: fatal: libvxscsi.so: open failed: No such file or directory
ld.so.1: vxparms: fatal: libvxscsi.so: open failed: No such file or directory
Killed
ld.so.1: ugettxt: fatal: libvxscsi.so: open failed: No such file or directory
ERROR:vxlufinish Failed: /altroot.5.10/usr/lib/vxvm/bin/vxencap -d -C 10176
-c -p 5555 -g
    -g altrootdg rootdisk=c0t1d0s2
    Please install, if 5.0 or higher version of VxVM is not installed
    on alternate bootdisk.

Workaround: Use the standard upgrade or dual-partition upgrade method instead.

Contact Sun support or your Sun representative to learn whether Sun Cluster 3.2 Live Upgrade support for VxVM 5.0 becomes available at a later date.

Live Upgrade Should Support Mounting Global Devices From Boot Disk (6433728)

Problem Summary: During live upgrade, the lucreate and luupgrade commands fail to change the DID names in the alternate boot environment that corresponds to the /global/.devices/node@N entry.

Workaround: Before you start the live upgrade, perform the following steps on each cluster node.

  1. Become superuser.

  2. Back up the /etc/vfstab file.


    # cp /etc/vfstab /etc/vfstab.old
    
  3. Open the /etc/vfstab file for editing.

  4. Locate the line that corresponds to /global/.device/node@N.

  5. Edit the global device entry.

    • Change the DID names to the physical names.

      Change /dev/did/{r}dsk/dYsZ to /dev/{r}dsk/cNtXdYsZ.

    • Remove global from the entry.

    The following example shows the name of DID device d3s3 which corresponds to /global/.devices/node@s, changed to its physical device names and the global entry removed:


    Original:
    /dev/did/dsk/d3s3    /dev/did/rdsk/d3s3    /global/.devices/node@2   ufs   2   no   global
    
    Changed:
    dev/dsk/c0t0d0s3     /dev/rdsk/c0t0d0s3    /global/.devices/node@2   ufs   2   no   -
  6. When the /etc/vfstab file is modified on all cluster nodes, perform live upgrade of the cluster, but stop before you reboot from the upgraded alternate boot environment.

  7. On each node, on the current, unupgraded, boot environment, restore the original /etc/vfstab file.


    # cp /etc/vstab.old /etc/vfstab
    
  8. In the alternate boot environment, open the /etc/vfstab file for editing.

  9. Locate the line that corresponds to /global/.devices/node@N and replace the dash (-) at to the end of the entry with the word global.


    /dev/dsk/cNtXdYsZ    /dev/rdsk/cNtXdYsZ    /global/.devices/node@N   ufs   2   no   global
    
  10. Reboot the node from the upgraded alternate boot environment.

    The DID names are substituted in the /etc/vfstab file automatically.

The vxlustart Script Fails to Create the Alternate Boot Environment During a Live Upgrade (6445430)

Problem Summary: This problem is seen when upgrading VERITAS Volume Manager (VxVM) during a Sun Cluster live upgrade. The vxlustart script is used to upgrade the Solaris OS and VxVM from the previous version. The script fails with error messages similar to the following:


# ./vxlustart -u 5.10 -d c0t1d0 -s OSimage

   VERITAS Volume Manager VxVM 5.0.
   Live Upgrade is now upgrading from 5.9 to <5.10>
…
ERROR: Unable to copy file systems from boot environment &lt;sorce.8876> to BE &lt;dest.8876>.
ERROR: Unable to populate file systems on boot environment &lt;dest.8876>.
ERROR: Cannot make file systems for boot environment &lt;dest.8876>.
ERROR: vxlustart: Failed: lucreate -c sorce.8876 -C /dev/dsk/c0t0d0s2 
-m -:/dev/dsk/c0t1d0s1:swap -m /:/dev/dsk/c0t1d0s0:ufs 
-m /globaldevices:/dev/dsk/c0t1d0s3:ufs -m /mc_metadb:/dev/dsk/c0t1d0s7:ufs 
-m /space:/dev/dsk/c0t1d0s4:ufs -n dest.8876

Workaround: Use the standard upgrade or dual-partition upgrade method if you are upgrading the cluster to VxVM 5.0.

Contact Sun support or your Sun representative to learn whether Sun Cluster 3.2 Live Upgrade support for VxVM 5.0 becomes available at a later date.

vxio Major Numbers Different Across the Nodes When the Root Disk is Encapsulated (6445917)

Problem Summary: For clusters that run VERITAS Volume Manager (VxVM), a standard upgrade or dual-partition upgrade of any of the following software fails if the root disk is encapsulated:

The cluster node panics and fails to boot after upgrade. This is due to the major-number or minor-number changes made by VxVM during the upgrade.

Workaround: Unencapsulate the root disk before you begin the upgrade.


Caution – Caution –

If the above procedure is not followed correctly, you may experience serious unexpected problems on all nodes being upgraded. Also, unencapsulation and encapsulation of root disk causes an additional reboot (each time) of the node automatically, increasing the number of required reboots during upgrade.


Cannot Use Zones Following Live Upgrade From Sun Cluster Version 3.1 on Solaris 9 to Version 3.2 on Solaris 10 (6509958)

Problem Summary: Following a live upgrade from Sun Cluster version 3.1 on Solaris 9 to version 3.2 on Solaris 10, zones cannot be used properly with the cluster software. The problem is that the pspool data is not created for the Sun Cluster packages. So those packages that must be propagated to the non-global zones, such as SUNWsczu, are not propagated correctly.

Workaround: After the Sun Cluster packages have been upgraded by using the scinstall -R command but before the cluster has booted into cluster mode, run the following script twice:

ProcedureInstructions for Using the Script

Before You Begin

Prepare and run this script in one of the following ways:

  1. Become superuser.

  2. Create a script with the following content.

    #!/bin/ksh
    
    typeset PLATFORM=${PLATFORM:-`uname -p`}
    typeset PATHNAME=${PATHNAME:-/cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster/Solaris_10/Packages}
    typeset BASEDIR=${BASEDIR:-/}
    
    cd $PATHNAME
    for i in *
    do
    	if pkginfo -R ${BASEDIR} $i >/dev/null 2>&1
    	then
    		mkdir -p ${BASEDIR}/var/sadm/pkg/$i/save/pspool
    		pkgadd -d . -R ${BASEDIR} -s ${BASEDIR}/var/sadm/pkg/$i/save/pspool $i
    	fi
    done
  3. Set the variables PLATFORM, PATHNAME, and BASEDIR.

    Either set these variables as environment variables or modify the values in the script directly.

    PLATFORM

    The name of the platform. For example, it could be sparc or x86. By default, the PLATFORM variable is set to the output of the uname -p command.

    PATHNAME

    A path to the device from where the Sun Cluster framework or data-service packages can be installed. This value corresponds to the -d option in the pkgadd command.

    As an example, for Sun Cluster framework packages, this value would be of the following form:


    /cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster/Solaris_10/Packages

    For the data services packages, this value would be of the following form:


    /cdrom/cdrom0/Solaris_${PLATFORM}/Product/sun_cluster_agents/Solaris_10/Packages
    BASEDIR

    The full path name of a directory to use as the root path and corresponds to the -R option in the pkgadd command. For live upgrade, set this value to the root path that is used with the -R option in the scinstall command. By default, the BASEDIR variable is set to the root (/) file system.

  4. Run the script, once for the Sun Cluster framework packages and once for the data-service packages.

    After the script is run, you should see the following message at the command prompt for each package:


    Transferring pkgname package instance

    Note –

    If the pspool directory already exists for a package or if the script is run twice for the same set of packages, the following error is displayed at the command prompt:


    Transferring pkgname package instance
    pkgadd: ERROR: unable to complete package transfer
        - identical version of pkgname already exists on destination device

    This is a harmless message and can be safely ignored.


  5. After you run the script for both framework packages and data-service packages, boot your nodes into cluster mode.

Can't Add Node to an Existing Sun Cluster 3.2–Patched Cluster Without Adding the Sun Cluster 3.2 Core Patch to the Node (6554107)

Problem Summary: Adding a new cluster node without ensuring that the node has the same patches as the existing cluster nodes might cause the cluster nodes to panic.

Workaround: Before adding nodes to the cluster, ensure that the new node is first patched to the same level as the existing cluster nodes. Failure to do this might cause the cluster nodes to panic.