Sun Cluster 3.1 10/03 Release Notes

Sun Cluster 3.1 10/03 Release Notes

This document provides the following information for SunTM Cluster 3.1 10/03 software.


Note –

For information about Sun Cluster 3.1 10/03 data services, refer to the Sun Cluster 3.1 Data Services 10/03 Release Notes.


What's New in Sun Cluster 3.1 10/03 Software

This section provides information related to new features, functionality, and supported products in Sun Cluster 3.1 10/03 software.

New Features and Functionality

Cluster Reconfiguration Notification Protocol

The Cluster Reconfiguration Notification Protocol (CRNP) provides a mechanism for applications to register for, and receive subsequent asynchronous notification of Sun Cluster reconfiguration events. Both data services running on the cluster and applications running outside of the cluster can register for event notification. Notifications include changes in cluster membership, resource groups, and resource state.

Disk-Path Monitoring

Disk-Path Monitoring (DPM) informs system administrators of diskpath failures for both primary and secondary paths. Disk path failure detection mechanism generates an event through the Cluster Event Framework and allows manual intervention.

Application Traffic Striping Over the Sun Cluster Private Network

This feature stripes IP traffic sent to the per-node logical IP addresses across all private interconnects. TCP traffic is striped on a per connection granularity. UDP traffic is striped on a per packet basis. .

Detecting Vulnerable Configurations

The integration of Sun's eRAS knowledge engine with the sccheck(1M), utility greatly increases the power of sccheck to detect “vulnerable” configurations by leveraging many existing eRAS checks. Vulnerability reports are produced fro both individual nodes as well as the cluster.

Role-Based Access Control

This feature enables the use of Role Based Access Control (RBAC) for cluster administration and operation.

Single Node Cluster

This feature extends Sun Cluster functionality to support single-node clusters.

Agent Builder Integrated with Sun ONE Studio

This feature enables developers to use Sun ONE Studio's development environment to create agents.

Centralized Install

This feature enhances scinstall(1M) to install all nodes of a new cluster from a single point of control. Additionally, it provides compatibility with the Solaris Web Start installation tool.

Localized Versions

Localized Sun Cluster components are now available in five languages and can be installed using the Web Start program. For more information, see Sun Cluster 3.1 10/03 Software Installation Guide.

Language 

Localized Sun Cluster Component 

French 

Installation 

Cluster Control Panel (CCP) 

Sun Cluster software 

Sun Cluster data services 

Sun Cluster module for Sun Management Center 

SunPlex Manager 

Sun Cluster data services 

Japanese 

Installation 

Cluster Control Panel (CCP)  

Sun Cluster software 

Sun Cluster data services 

Sun Cluster module for Sun Management Center 

SunPlex Manager 

Sun Cluster man pages 

Cluster Control Panel man pages 

Sun Cluster data services man pages 

Simplified Chinese 

Installation 

Cluster Control Panel (CCP) 

Sun Cluster software 

Sun Cluster data services 

Sun Cluster module for Sun Management Center 

SunPlex Manager 

Traditional Chinese 

Installation 

Cluster Control Panel (CCP) 

Sun Cluster software 

Sun Cluster data services 

Sun Cluster module for Sun Management Center 

SunPlex Manager 

Korean 

Installation 

Cluster Control Panel (CCP) 

Sun Cluster software 

Sun Cluster data services 

Sun Cluster module for Sun Management Center 

SunPlex Manager 

Data Services

For information on data services enhancements, see Sun Cluster 3.1 Data Services 10/03 Release Notes.

Supported Products

This section describes the supported software and memory requirements for Sun Cluster 3.1 10/03 software.

Restrictions

The following restrictions apply to the Sun Cluster 3.1 10/03 release:

For other known problems or restrictions, see Known Issues and Bugs.

Hardware Restrictions

Network Restrictions

Volume-Manager Restrictions

Cluster File System Restrictions

VxFS Restrictions

Internet Protocol (IP) Network Multipathing Restrictions

This section identifies any restrictions on using IP Network Multipathing that apply only in a Sun Cluster 3.1 10/03 environment, or are different than information provided in the Solaris documentation for IP Network Multipathing.

Most procedures, guidelines, and restrictions that are identified in the Solaris documentation for IP Network Multipathing are the same in a cluster or a noncluster environment. Therefore, see the appropriate Solaris document for additional information about IP Network Multipathing restrictions.

Operating Environment Release 

For Instructions, Go To... 

Solaris 8 operating environment 

IP Network Multipathing Administration Guide

Solaris 9 operating environment 

“IP Network Multipathing Topics” in System Administration Guide: IP Series

Service and Application Restrictions

Data Service Restrictions

For information about restrictions for specific data services, see Sun Cluster 3.1 Data Services 10/03 Release Notes.

Running Sun Cluster HA for Oracle 3.0 on Sun Cluster 3.1 10/03 Software

The Sun Cluster HA for Oracle 3.0 data service can run on Sun Cluster 3.1 10/03 software only when used with the following versions of the Solaris operating environment:

Known Issues and Bugs

The following known issues and bugs affect the operation of the Sun Cluster 3.1 10/03 release. For the most current information, see the online Sun Cluster 3.1 10/03 Release Notes Supplement at http://docs.sun.com.

Incorrect Largefile Status (4419214)

Problem Summary: The /etc/mnttab file does not show the most current largefile status of a globally mounted VxFS filesystem.

Workaround: Use the fsadm command to verify the filesystem largefile status, instead of the /etc/mnttab entry.

Nodes Unable to Bring Up qfe Paths (4526883)

Problem Summary: Sometimes, private interconnect transport paths ending at a qfe adapter fail to come online.

Workaround: Follow the steps shown below:

  1. Using scstat -W, identify the adapter that is at fault. The output will show all transport paths with that adapter as one of the path endpoints in the faulted or the waiting states.

  2. Use scsetup to remove from the cluster configuration all the cables connected to that adapter.

  3. Use scsetup again to remove that adapter from the cluster configuration.

  4. Add back the adapter and the cables.

  5. Verify if the paths appear. If the problem persists, repeat steps 1–5 a few times.

  6. Verify if the paths appear. If the problem still persists, reboot the node with the at-fault adapter. Before the node is rebooted, make sure that the remaining cluster has enough quorum votes to survive the node reboot.

File Blocks Not Updated Following Writes to Sparse File Holes (4607142)

Problem Summary: A file's block count is not always consistent across cluster nodes following block-allocating write operations within a sparse file. For a cluster file system layered on UFS (or VxFS 3.4), the block inconsistency across cluster nodes disappears within 30 seconds or so.

Workaround: File metadata operations which update the inode (touch, etc.) should synchronize the st_blocks value so that subsequent metadata operations will ensure consistent st_blocks values.

During a Network Failure, the Data Service Starts and Stops Incorrectly (4644289)

Problem Summary: The Sun Cluster HA for Oracle data service uses the su command to start and stop the database. The network service might become unavailable when a cluster node's public network fails.

Workaround: In Solaris 9, configure the /etc/nsswitch.conf files as follows so that the data service starts and stops correctly in the event of a network failure:

On each node that can be a primary for oracle_server or oracle_listener resource, modify/etc/nsswitch.conf to include the following entries for passwd, group, publickey, and project databases:

Adding the above entries ensures that the su(1M) command does not refer to the NIS/NIS+ name services.

Unmounting of a Cluster File System Fails (4656624)

Problem Summary: The unmounting of a cluster file system fails sometimes even though the fuser command shows that there are no users on any node.

Workaround: Retry the unmounting after all asynchronous I/O to the underlying file system has been completed.

Sun Cluster HA–Siebel Fails to Monitor Siebel Components (4722288)

Problem Summary: The Sun Cluster HA-Siebel agent will not monitor individual Siebel components. If failure of a Siebel component is detected, only a warning message would be logged in syslog.

Workaround: Restart the Siebel server resource group in which components are offline using the command scswitch -R -h node -g resource_group.

Oracle RAC Instances May Become Unavailable on Newly Added Nodes (4723575)

Problem Summary: Installing Sun Cluster support for RAC on a newly added node will cause unavailability of Oracle RAC instances.

Workaround: To add a node into a cluster currently running with Oracle RAC support, without losing availability of the Oracle RAC database, requires special installation steps. The example shown below describes going from a 3–node cluster to a 4–node cluster, with Oracle RAC running on nodes 1, 2, and 3:

  1. Install the Sun Cluster software on the new node (node 4).

    Note: Do not install the RAC support packages as this time.

  2. Reboot the new node into the cluster.

  3. Once the new node has joined the cluster, shutdown the Oracle RAC database on one of the nodes where it is already running (node 1, in this example).

  4. Reboot the node where the database was just shutdown (node 1).

  5. Once the node (node 1) is back up, start the Oracle database on that node to resume database service.

  6. If a single node is capable of handling the database workload, shutdown the database on the remaining nodes (nodes 2 and 3), and reboot these nodes. If more than one node is required to support the database workload, do them one at a time as described in steps 3 to 5.

  7. Once all nodes have been rebooted, it is safe to install the Oracle RAC support packages on the new node.

The remove Script Fails to Unregister SUNW.gds Resource Type (4727699)

Problem Summary: The remove script fails to unregister SUNW.gds resource type and displays the following message:

Resource type has been un-registered already.

Workaround: After using the remove script, manually unregister SUNW.gds. Alternatively, use the scsetup command or the SunPlex Manager.

Using the Solaris shutdown Command May Result in Node Panic (4745648)

Problem Summary: Using the Solaris shutdown command or similar commands (for example, uadmin) to bring down a cluster node may result in node panic and display the following message:

CMM: Shutdown timer expired. Halting.

Workaround: Contact your Sun service representative for support. The panic is necessary to provide a guaranteed safe way for another node in the cluster to take over the services that were being hosted by the shutting-down node.

Path Timeouts When Using ce Adapters on the Private Interconnect (4746175)

Problem Summary: Clusters using ce adapters on the private interconnect may notice path timeouts and subsequent node panics if one or more cluster nodes have more than four processors.

Workaround: Set the ce_taskq_disable parameter in the ce driver by adding set ce:ce_taskq_disable=1 to /etc/system file on all cluster nodes and then rebooting the cluster nodes. This ensures that heartbeats (and other packets) are always delivered in the interrupt context, eliminating path timeouts and the subsequent node panics. Quorum considerations should be observed while rebooting cluster nodes.

scrgadm Prevents IP Addresses of Different Subnets to Reside on one NIC (4751406)

Problem Summary: scrgadmprevents the hosting of logical hostnames/shared addresses which belong to a subnet that is different from the subnet of the IPMP (NAFO) group.

Workaround: Use the following form of the scrgadmcommand:

scrgadm -a -j <resource> -t <resource_type> -g <resource_group> -x HostnameList=<logical_hostname> -x NetIfList=<nafogroup>@<nodeid>.

Note that nodenames do not appear to work in the NetIfList; use nodeids, instead.

Unsuccessful Failover Results in Error (4766781)

Problem Summary: An unsuccessful failover/switchover of a file system might leave the file system in an errored state.

Workaround: Unmount and remount the file system.

Node Hangs After Rebooting When Switchover Is in Progress (4806621)

Problem Summary: If a device group switchover is in progress when a node joins the cluster, the joining node and the switchover operation may hang. Any attempts to access any device service will also hang. This is more likely to happen on a cluster with more than two nodes and if the file system mounted on the device is a VxFS file system.

Workaround: To avoid this situation, do not initiate device group switchovers while a node is joining the cluster. If this situation occurs, then all the cluster nodes must be rebooted to restore access to device groups.

DNS Wizard Fails if an Existing DNS Configuration is not Supplied (4839993)

Problem Summary: SunPlex Manager includes a data service installation wizard that sets up a highly available DNS service on the cluster. If the user does not supply an existing DNS configuration, such as a named.conf file, the wizard attempts to generate a valid DNS configuration by autodetecting the existing network and nameservice configuration. However, it fails in some network environments, causing the wizard to fail without issuing an error message.

Workaround: When prompted, supply the SunPlex Manager DNS data service install wizard with an existing, valid named.conf file. Otherwise, follow the documented DNS data service procedures to manually configure highly available DNS on the cluster.

Using SunPlex Manager to Install an Oracle Service (4843605)

Problem Summary: SunPlex Manager includes a data service installation wizard which sets up a highly available Oracle service on the cluster by installing and configuring the Oracle binaries as well as creating the cluster configuration. However, this installation wizard is currently not working, and results in a variety of errors based on the users' software configuration.

Workaround: Manually install and configure the Oracle data service on the cluster, using the procedures provided in the Sun Cluster documentation.

Shutdown or Reboot Sequence Fails (4844784)

Problem Summary: When shutting down or rebooting a node, the node may hang and the shutdown or reboot sequence may not complete. The system hangs after issuing the following message: Failfast: Halting because all userland daemons all have died.

Workaround: Before shutting down or rebooting the node, issue the following command: psradm -f -a:

To shutdown a node:

  1. # scswitch -S -h <node>

  2. # psradm -f -a

  3. # shutdown -g0 -y -i0

To reboot a node:

  1. # scswitch -S -h <node>

  2. # psradm -f -a

  3. # shutdown -g0 -y -i6


Note –

In some rare instances, the suggested workarounds may fail to resolve this problem.


Rebooting a Node (4862321)

Problem Summary: On large systems running Sun Cluster 3.x, shutdown -g0 -y -i6, the command to reboot a node, can make the system to go to the OK prompt with the message Failfast: Halting because all userland daemons have died, instead of rebooting.

Workaround: Use one of the following workarounds:

Remember to re-enable failfasts after the node has rebooted:

# /usr/cluster/lib/sc/cmm_ctl -f

or increase the failfast_panic_delay timeout before shutting down the system, using the following mdb command:

(echo 'cl_comm`conf+8/W 0t600000' ;

echo 'cl_comm`conf+c/W 0t600000') | mdb -kw

This sets the timeout to 600000 ms (10 minutes).

Oracle DLM Process Remains Alive During Node Shutdown (4891227)

Problem Summary: Oracle DLM process does not terminate during shutdown and prevents /var from being unmounted.

Workaround: Use one of the following two workarounds:

Oracle Listener Probe May Timeout on a Heavily Loaded System (4900140)

Problem Summary: The Oracle listener probe may timeout on a heavily loaded system, causing the Oracle listener to restart.

Workaround: On a heavily loaded system, the Oracle listener resource probe timeouts may be prevented by increasing the value of Thorough_probe_interval property of the resource.

The probe timeout is calculated as follows:

10 seconds if Thorough_probe_interval is greater than 20 seconds

60 seconds if Thorough_probe_interval is greater than 120 seconds

Thorough_probe_interval/2 in other cases

RG_system Resource Group Property Update may Result in Node Panic (4902066)

Problem Summary: When set to TRUE, the RG_system resource group property indicates that the resource group and its resources are being used to support the cluster infrastructure, instead of implementing a user data service. If RG_system is TRUE, the RGM prevents the system administrator from inadvertently switching the group or its resources offline, or modifying their properties. In some instances, the node may panic when you try to modify a resource group property after setting the RG_system property to TRUE.

Workaround: Do not edit the value of the RG_system resource group property.

nsswitch.conf Requirements for passwd Make nis Unusable (4904975)

Problem Summary: On each node that can master the liveCache resource, the su command might hang when the public net is down.

Workaround: On each node that can master the liveCache resource, the following changes to /etc/nsswitch.conf are recommended so that the su command will not hang when the public net is down:

passwd: files nis [TRYAGAIN=0]

Data Service Installation Wizards for Oracle and Apache do not Support Solaris 9 and Above (4906470)

Problem Summary: The SunPlex Manager data service installation wizards for Apache and Oracle do not support Solaris 9 and above.

Workaround: Manually install Oracle on the cluster using, using Sun Cluster documentation. If installing Apache on Solaris 9 (or higher), manually add the Solaris Apache packages SUNWapchr and SUNWapchu before running the installation wizard.

Installation Fails if Default Domain is not set (4913925)

Problem Summary: When adding nodes to a cluster during installation and configuration, you may see an “RPC authentication“ failure. The error messages are similar to the following:

This will occur when the nodes are not configured to use NIS/NIS+, particularly if the file /etc/defaultdomain is not present.

Workaround: When a domain name is not set (that is, the /etc/defaultdomain file is missing), set the domain name on all nodes joining the cluster, using the domainname(1M) command before proceeding with the installation. For example, # domainname xxx.

Patches and Required Firmware Levels

This section provides information about patches for Sun Cluster configurations.


Note –

You must be a registered SunSolveTM user to view and download the required patches for the Sun Cluster product. If you do not have a SunSolve account, contact your Sun service representative or sales engineer, or register online at http://sunsolve.sun.com.


PatchPro

PatchPro is a patch-management tool designed to ease the selection and download of patches required for installation or maintenance of Sun Cluster software. PatchPro provides a Sun Cluster-specific Interactive Mode tool to make the installation of patches easier and an Expert Mode tool to maintain your configuration with the latest set of patches. Expert Mode is especially useful for those who want to get all of the latest patches, not just the high availability and security patches.

To access the PatchPro tool for Sun Cluster software, go to http://www.sun.com/PatchPro/, click on “Sun Cluster,” then choose either Interactive Mode or Expert Mode. Follow the instructions in the PatchPro tool to describe your cluster configuration and download the patches.

SunSolve Online

The SunSolveTM Online Web site provides 24-hour access to the most up-to-date information regarding patches, software, and firmware for Sun products. Access the SunSolve Online site athttp://sunsolve.sun.com for the most current matrixes of supported software, firmware, and patch revisions.

You can find Sun Cluster 3.1 10/03 patch information by using the Info Docs. To view the Info Docs, log on to SunSolve and access the Simple search selection from the top of the main page. From the Simple Search page, click on the Info Docs box and type Sun Cluster 3.1 in the search criteria box. This will bring up the Info Doc page for Sun Cluster 3.1 software.

Before you install Sun Cluster 3.1 10/03 software and apply patches to a cluster component (Solaris operating environment, Sun Cluster software, volume manager or data services software, or disk hardware), review the Info Docs and any README files that accompany the patches. All cluster nodes must have the same patch level for proper cluster operation.

For specific patch procedures and tips on administering patches, see “Patching Sun Cluster Software and Firmware” in Sun Cluster 3.1 10/03 System Administration Guide.

Sun Cluster 3.1 10/03 Documentation

The Sun Cluster 3.1 10/03 user documentation set is available in PDF and HTML format on the Sun Cluster 3.1 10/03 CD-ROM.


Note –

The Sun Cluster 3.1 Data Services 10/03 user documentation set is available on the Sun Cluster 3.1 Agents 10/03 CD-ROM.


AnswerBook2TM server software is not needed to read Sun Cluster 3.1 10/03 documentation. See the index.html file at the top level of either CD-ROM for more information. This index.html file enables you to read the PDF and HTML manuals directly from the CD-ROM and to access instructions to install the documentation packages.


Note –

The SUNWsdocs package must be installed before you install any Sun Cluster documentation packages. You can use pkgadd to install the SUNWsdocs package. The SUNWsdocs package is located in the SunCluster_3.1/Sol_N/Packages/ directory of the Sun Cluster 3.1 10/03 CD-ROM, where N is either 8 for Solaris 8 or 9 for Solaris 9. The SUNWsdocs package is also automatically installed when you run the installer program from the Solaris 9 Documentation CD-ROM.


The Sun Cluster 3.1 10/03 user documentation set consists of the following collections:

For a listing of the manuals contained in Sun Cluster 3.1 Data Services 10/03 Collection, see the Sun Cluster 3.1 Data Services 10/03 Release Notes.

In addition, the docs.sun.comSM web site enables you to access Sun Cluster documentation on the Web. You can browse the docs.sun.com archive or search for a specific book title or subject at the following Web site:

http://docs.sun.com

Documentation Issues

This section discusses known errors or omissions for documentation, online help, or man pages and steps to correct these problems.


Caution – Caution –

Sun Cluster 3.1 10/03 does not support RSM Transport. All references to RSM transport appearing in the Sun Cluster documentation collection must be disregarded.


Software Installation Guide

This section discusses known errors or omissions from the Sun Cluster 3.1 10/03 Software Installation Guide.

SunPlex Manager Online Help

This section discusses errors and omissions in SunPlex Manager online help.

Sun Cluster HA for Oracle

In the online help file that is titled “Sun Cluster HA for Oracle,” in the section titled “Before Starting,” a note is incorrect.

Incorrect:

If no entries exist for shmsys and semsys in /etc/system, default values for these variables are automatically inserted in/etc/system. The system must then be rebooted. Check Oracle installation documentation to verify that these values are correct for your database.

Correct:

If no entries exist for the shmsys and semsys variables in the /etc/system file when you install the Oracle data service, you can open /etc/system and insert default values for these variables. You must then reboot the system. Check Oracle installation documentation to verify that the values that you insert are correct for your database.

Role-Based Access Control (RBAC) (4895087)

In the table under "Sun Cluster RBAC Rights Profiles," the authorizations solaris.cluster.appinstall and solaris.cluster.install should be listed under the Cluster Management profile rather than the Cluster Operation profile.

In the table under “Sun Cluster RBAC Rights Profiles,” under the profile Sun Cluster Commands, sccheck(1M) should also be included in the list of commands.

System Administration Guide

This section discusses errors and omissions from the Sun Cluster 3.1 10/03 System Administration Guide.

Simple Root Disk Groups With VERITAS Volume Manager

Simple root disk groups are not supported as disk types with VERITAS Volume Manager on Sun Cluster software. As a result, if you perform the procedure “How to Restore a Non-Encapsulated root (/) File System (VERITAS Volume Manager)” in the Sun Cluster 3.1 System Administration Guide, you should ignore Step 9, which asks you to determine if the root disk group (rootdg) is on a single slice on the root disk. You would complete Step 1 through Step 8, skip Step 9, and proceed with Step 10 to the end of the procedure.

Changing the Number of Node Attachments to a Quorum Device

When increasing or decreasing the number of node attachments to a quorum device, the quorum vote count is not automatically recalculated. You can re-establish the correct quorum vote if you remove all quorum devices and then add them back into the configuration.

Error Messages Guide

Certain error messages related to Sun Cluster data services are missing from the Error Messages Guide. For a list of error messages that did not get included in the documentation collection, see Sun Cluster 3.1 Data Services 10/03 Release Notes

Data Services Collection

Errors and omissions related to the Data Services documentation are described in the Sun Cluster 3.1 Data Services 10/03 Release Notes.

Man Pages

Thi section discusses errors and omissions from the Sun Cluster man pages.

scconf_transp_adap_sci(1M)

The scconf_transp_adap_sci(1M) man page states that SCI transport adapters can be used with the rsm transport type. This support statement is incorrect. SCI transport adapters do not support the rsm transport type. SCI transport adapters support the dlpi transport type only.

scconf_transp_adap_sci(1M)

The following sentence clarifies the name of an SCI–PCI adapter. This information is not currently included in the scconf_transp_adap_sci(1M) man page.

New Information:

Use the name sciN to specify an SCI adapter.

scgdevs(1M)

The following paragraph clarifies behavior of the scgdevs command. This information is not currently included in the scgdevs(1M) man page.

New Information:

scgdevs(1M) called from the local node will perform its work on remote nodes asynchronously. Therefore, command completion on the local node does not necessarily mean it has completed its work cluster wide.

rt_properties(5)

In this release, the current API_version has been incremented to 3 from its previous value of 2. If you are developing a new Sun Cluster agent and wish to prevent your new resource type from being registered on an earlier version of Sun Cluster software, declare API_version=3 in your agent's RTR file. For more information, see rt_reg(4) and rt_properties(5).

Sun Cluster 3.0 Data Service Man Pages

To display Sun Cluster 3.0 data service man pages, install the latest patches for the Sun Cluster 3.0 data services that you installed on Sun Cluster 3.1 10/03 software. See Patches and Required Firmware Levels for more information.

After you have applied the patch, access the Sun Cluster 3.0 data service man pages by issuing the man -M command with the full man page path as the argument. The following example opens the Apache man page.


% man -M /opt/SUNWscapc/man SUNW.apache

Consider modifying your MANPATH to enable access to Sun Cluster 3.0 data service man pages without specifying the full path. The following example describes command input for adding the Apache man page path to your MANPATH and displaying the Apache man page.


% MANPATH=/opt/SUNWscapc/man:$MANPATH; export MANPATH
% man SUNW.apache