Sun Cluster 2.2 Release Notes Addendum

Previous: Configuring Sun Cluster Manager

Known Problems

The following known problems affect the operation of Sun Cluster 2.2. These are in addition to the known problems described in the Sun Cluster 2.2 Release Notes.

Framework Bugs

4218052 - Sun Cluster should support modification of TCP ports used by CVM cluster daemons. TCP ports used by CVM cluster daemons might conflict with ports used by other applications running in the cluster. You cannot modify which TCP ports are used by CVM cluster daemons. Instead, you must modify any applications that use conflicting ports.

CVM uses the following port numbers:

`cvm.port.vxkmsgd`	5559
`cvm.port.vxconfigd`	5560
`cvm.port.vxclust`	5568
`vxclust`	5568-5600

4233113 - Documentation omission regarding logical host timeout values and how they are used. When you configure the cluster, you set a timeout value for the logical host. This timeout value is used by the CCD when you bring a data service up or down using the hareg(1M) command. The CCD operation occurs in two steps; half of the timeout value is used for each step. Therefore, when configuring START and STOP methods for data services, make sure each method uses no more than half of the timeout value set for the logical host.

4291427 - Locales only: uninstall fails to remove the SUNWccon and SUNWscch packages. In all locale versions of Sun Cluster 2.2 running on Solaris 7, removal of the client packages using the scinstall(1M) command can fail with the following error message:

Patch 108400-02 is required to be installed by patch 108446-02
it cannot be backed out until patch 108446-02 is backed out.

This occurs because of patch dependencies between patch 108446-02 and patch 108400-02. Work around the problem by manually removing patches 108446-02 and 108400-02, and then re-starting the package removal process using scinstall(1M).

Data Service Bugs

4213692 - If the data service or cluster is configured incorrectly, problems with the startup of a data service might cause the cluster framework to switch the data service to the backup node. If the data service fails to start on the backup node, it is switched back to the original node. This switching behavior continues until stopped by manual intervention.

4304532 - Adding a data service to an existing two-node cluster with shared CCD fails with registration errors. After adding a new data service to a two-node cluster with shared CCD, registration of the data service will fail because the shared CCD was not updated correctly. To correct this situation, stop the cluster, uninstall the new data service packages using the scinstall(1M) command, restart the cluster on both nodes, and then use the procedure "Adding a Data Service to a Two-Node Cluster With Shared CCD" to add the data services correctly.

4247239 - Cannot add data service if shared CCD is used and both nodes are not in cluster. In a two-node cluster with shared CCD, adding a data service fails with error messages indicating a corrupted ccd.database file. To correct this situation, stop the cluster, uninstall the new data service packages using the scinstall(1M) command, restart the cluster on both nodes, and then use the procedure "Adding a Data Service to a Two-Node Cluster With Shared CCD" to add the data services correctly.

SCM Bugs

4221612 - SCM sometimes incorrectly reports that the Sun Cluster HA for Netscape HTTP data service is down when it is up.

http://suncluster.eng.sun.com/support-matrix/SC2.2/index.html

Upgrade-Related Bugs

4215070 - The scinstall(1M) command does not upgrade the Sun Cluster HA for SAP package, SUNWscsap, during upgrade to Sun Cluster 2.2 from Sun Cluster 2.1. Work around the problem by replacing the SUNWscsap package manually during the upgrade, as described in the procedure "How to Upgrade to Sun Cluster 2.2 From Sun Cluster 2.0 or 2.1".

4218558 - The Sun Cluster HA for SAP data service is not registered correctly during upgrade to Sun Cluster 2.2 from HA 1.3. This prevents the data service from starting up correctly after the upgrade has been completed. Work around the problem by explicitly unregistering and then registering the data service by using the hareg(1M) command:

# hareg -n sap
# hareg -u sap
# hareg -s -r sap -h CI_logicalhost
# hareg -y sap

For the complete upgrade procedure, see "How to Upgrade to Sun Cluster 2.2 From HA 1.3".

4218574 - Upgrade to Sun Cluster 2.2 from HA 1.3 fails if patch 104996 (required for Solstice HA-DBMS for Oracle7) is installed on the pre-upgrade system. This occurs because patch 104996 depends upon patch 105008, which the scinstall(1M) command attempts to remove during the upgrade. Work around the problem by removing patch 104996 manually before using scinstall(1M) to upgrade from HA 1.3. See "How to Upgrade to Sun Cluster 2.2 From HA 1.3" for the complete upgrade procedure.

4218613 - During upgrade to Sun Cluster 2.2 from HA 1.3, instance configuration information for the HA-DBMS data services is not propagated to the new cluster. This prevents the database instances from starting when the new cluster is started. This bug affects the Sun Cluster HA for Oracle, Sun Cluster HA for Sybase, and Sun Cluster HA for Informix data services.

Work around the problem by manually recreating the database instance after completing the upgrade. Use the appropriate hadbms insert command (haoracle insert, hasybase insert, or hainformix insert) as described in the associated man pages, and in the appropriate data service chapters in the Sun Cluster 2.2 Software Installation Guide. For the complete upgrade procedures, see "How to Upgrade to Sun Cluster 2.2 From HA 1.3".

After you recreate the database instances, start the instances by using the appropriate hadbms start command.

4218620 - During upgrade to Sun Cluster 2.2, existing instance configuration data for Sun Cluster HA for SAP is not propagated to 2.2. Therefore, the SAP instance fails to start when the cluster is started. Work around the problem by manually re-creating the Sun Cluster HA for SAP instance after completing the upgrade, by using the hadsconfig(1M) command to specify all instance parameters. See the revised upgrade procedures in "Upgrading to Sun Cluster 2.2". See also Section 10.6.1, "Configuration Parameters for Sun Cluster HA for SAP" in the Sun Cluster 2.2 Software Installation Guide, for a full description of the Sun Cluster HA for SAP parameters. New parameters exist that significantly impact the behavior of the data service.

4218823 - During the upgrade from HA 1.3 to Sun Cluster 2.2, only two of three required IP addresses are added to the /.rhosts file on each node. The address lost is the highly available IP address for the private interconnects. Utilities such as hadsconfig(1M) will not work without this entry. The user must manually add the required entries to the /.rhosts file. The procedure is documented on page 3-26 of the Sun Cluster 2.2 Software Installation Guide, and in the revised upgrade procedures in "Upgrading to Sun Cluster 2.2".

4219689 - Adding a data service immediately after upgrading to Sun Cluster 2.2 removes a required entry from the cdb file. Restore the correct entry to the cdb file by selecting "Remove Volume Manager" from the scinstall(1M) Change menu. Then select "Choose Volume Manager" from the same menu, and select the volume manager that you are using.

Documentation Errata

4220504 - Page 4-3 in the Sun Cluster 2.2 System Administration Guide includes instructions to run the scadmin startnode command simultaneously on all nodes. Instead, the scadmin startnode command should be run on only one node at a time.

4222817 - Page 8-20 in the Sun Cluster 2.2 Software Installation Guide includes instructions to install Sun Cluster HA for Netscape LDAP by adding the SUNWhadns package. The correct package name is SUNWscnsl.

4224989 - Page 1-25 in the Sun Cluster 2.2 Software Installation Guide includes the statement:

"When Solstice DiskSuite is specified as the volume manager, you cannot configure direct-attach devices, that is, devices that directly attach to more than 2 nodes. Disks can only be connected to pairs of nodes."

This statement is incorrect. Direct-attach devices are supported with Solstice DiskSuite and Sun Cluster 2.2.

4258156 - Page 1-10 in the Sun Cluster 2.2 Software Installation Guide includes the statement that in parallel database configurations, any server failure is recognized by the cluster software, and subsequent user queries are re-routed through one of the remaining servers. This statement is untrue. In the case of a server failure, a cluster reconfiguration occurs automatically and the user queries are dropped. The user must initiate a new query through an active server, or through the original server after it has been restored to service.

You can configure Oracle Parellel Server such that a restart of the application will reconnect the clients to an active server. Configure this by modifying the tnsnames.ora file on all clients, using the procedure described in Section 14.1.4.2, "Configuring Oracle SQL*Net," in the Sun Cluster 2.2 Software Installation Guide.

Impact of quorum device failure - page 1-18 in the Sun Cluster 2.2 Software Installation Guide includes this note:

"The failure of a quorum device is similar to the failure of a node in a two-node cluster."

This note is misleading. Although the failure of a quorum device does not cause a failover of services, it does reduce the high availability of a two node cluster in that no further node failures can be tolerated. A failed quorum device can be reconfigured or replaced while the cluster is running. The cluster can remain running as long as no other component failure occurs while the quorum repair or replacement is in progress.

Using scconf(1M) to remove a cluster node - In Chapter 3 of the Sun Cluster 2.2 System Administration Guide, the procedure "How to Remove a Cluster Node" includes a step to use scconf -A clustername n to remove a cluster node. Note that in this command, the number n does not represent a node number, but instead represents the total number of cluster nodes that will be active after the scconf operation. The scconf operation always removes from the cluster the node with the highest node number. For example, assume a 4-node cluster. The following command would remove nodes 3 and 4 from a four-node cluster, resulting in a two-node cluster:

# scconf sc-cluster -A 2

Undocumented Error Messages

The following error messages for Sun Cluster HA for SAP were omitted from the Sun Cluster 2.2 Error Messages Manual.

SUNWcluster.ha.sap.stop_net.2076: proha:SUNWscsap_PRO: Found 2 leftover IPC objects for SAP instance, removing via cleanipc

This message indicates that during shutdown of the SAP central instance by the stop_net method, two IPC segments from the central instance were found. The stop_net code uses the SAP-supplied utility cleanipc to remove all IPC segments of the central instance during shutdown (and also before startup). This is to ensure a thorough shutdown as well as a clean startup. The error message is an informational message only, and is expected. No user action is required.

Graceful shutdown failed for oracle instance PRO, starting abort

This message indicates that the HA-Oracle oracle_db_shutdown script did not complete a graceful shutdown of the database within the timeout limit (30 seconds, by default). If the normal shutdown does not complete during the allowed time, then a shutdown abort is issued. This is an informational message and no user action is required.

SUNWcluster.ccd.ccdctl.4403: (error) checkpoint, ccdd, ticlts: RPC: Program not registered

This message indicates that the ccdadm command could not contact the ccdd demon for the requested operation--the RPC call clnt_create() failed. Verify that the cluster has been started on the current node, and the ccdd daemon is running.

SUNWcluster.clustd.transition.4010: cluster aborted on this node nodename

This message indicates that the current node is being aborted. Other error messages should indicate why this is occurring; check the scadmin.log log file in /var/opt/SUNWcluster.

reconf.pnm.3009: pnminit faced problems

This message is generated by the script /opt/SUNWcluster/bin/pnm. This script is called during step 1 of cluster reconfiguration, when PNM is initialized with pnminit. The error message appears if the execution of pnminit resulted in a non-zero exit. Reasons for a non-zero exit of pnminit include:

Invalid command line arguments
Enviroment variables not set (localnodeid, clustname, currnodes, numnodes)
pnminit could not communicate with the ccdd daemon correctly

Check for any error messages logged to /var/opt/SUNWcluster/ccd/ccd.log, then restart the cluster reconfiguration.

SUNWcluster.reconfig.4018: Aborting--received abort request from nodename

This message indicates a request from a remote node to abort the current node. Use checksum to verify that the /etc/opt/SUNWcluster/conf/clustername.cdb files are identical on all nodes. If necessaryt, manually copy the most recent clustername.cdb file to all nodes, and then restart the cluster.

Previous: Configuring Sun Cluster Manager