Sun Cluster 3.0 Release Notes

Chapter 1 Sun Cluster 3.0 Release Notes

This document provides the following information for the SunTM Cluster 3.0 software release:

The appendices to this document include installation planning worksheets and examples to use when planning the Sun Cluster 3.0 software and data services installation. The Sun Cluster 3.0 AnswerBooksTM also include these worksheets.

New Features

This release includes the following new features:

Supported Products

This section describes the supported software and memory requirements for Sun Cluster 3.0.

Installing Sun Cluster AnswerBooks

The Sun Cluster 3.0 user documentation is available in AnswerBook2 format for use with AnswerBook2 documentation servers. The Sun Cluster 3.0 AnswerBook2 documentation set consists of:

Setting Up the AnswerBook2 Documentation Server

The Solaris operating environment release includes AnswerBook2 documentation server software. The Solaris documentation CD-ROM, which is separate from the Solaris operating environment CD-ROM, includes the documentation server software. You need the Solaris documentation CD-ROM to install an AnswerBook2 documentation server.

If you have an AnswerBook2 documentation server installed at your site, you can use the same server for the Sun Cluster 3.0 AnswerBooks. If you do not have an AnswerBook2 documentation server installed, install a documentation server on a machine at your site. The administrative console that you use as the administrative interface to your cluster is a good choice for the documentation server. Do not use a cluster node as your AnswerBook2 documentation server.

For complete information on installing an AnswerBook2 documentation server, load the Solaris documentation CD-ROM on a server, and view the README files.

Viewing Sun Cluster AnswerBooks

Use the following procedure to view Sun Cluster 3.0 AnswerBooks from your AnswerBook2 documentation server. Install the Sun Cluster AnswerBook2 documents on a file system on the same server on which you install the documentation server. The Sun Cluster 3.0 AnswerBooks include a post-install script that automatically adds the documents to your existing AnswerBook library.

To use this procedure, you need the following:

How to Install the Sun Cluster AnswerBooks

Use this procedure to install the Sun Cluster AnswerBook packages for the Sun Cluster 3.0 Collection and Sun Cluster 3.0 Data Services Collection.

  1. Become superuser on the server that has the AnswerBook2 documentation server installed.

  2. If you have previously installed the Sun Cluster AnswerBooks, remove the old packages.

    If you have never installed Sun Cluster AnswerBooks, ignore this step.


    # pkgrm SUNWscfab SUNWscdab
    
  3. Insert the Sun Cluster CD-ROM or Sun Cluster Data Services CD-ROM into a CD-ROM drive attached to your documentation server.

    The Volume Management daemon, vold(1M), should mount the CD-ROM automatically.

  4. Change directory to the location on the CD-ROM that contains the Sun Cluster AnswerBook package to install.

    The following directory contains the package for the Sun Cluster CD-ROM: suncluster_3_0/SunCluster_3.0/Packages.

    The following directory contains the package for the Sun Cluster Data Services CD-ROM: scdataservices_3_0/components/SunCluster_Data_Service_Answer_Book_3.0/Packages.

  5. Use the pkgadd(1) command to install the package.


    # pkgadd -d .
    
  6. Select the packages to install.

    Select the Sun Cluster 3.0 Collection (SUNWscfab) and the Sun Cluster 3.0 Data Services Collection (SUNWscdab).

  7. From the pkgadd installation options menu, choose heavy to add the complete package to the system and to update the AnswerBook2 catalog.

    Select either the Sun Cluster 3.0 Collection (SUNWscfab) or the Sun Cluster 3.0 Data Services Collection (SUNWscdab).

The document collection package included on each CD-ROM includes a post-install script that adds the collection to the documentation server's database and restarts the server. You should now be able to view the Sun Cluster AnswerBooks using your documentation server.

Viewing PDF Files

The Sun Cluster CD-ROMs now include a PDF file for each book in the Sun Cluster documentation set.

The following directory on the Sun Cluster CD-ROM contains the PDF files: ./suncluster_3_0/SunCluster_3.0/Docs/locale/C/PDF.

The following directory on the Data Services CD-ROM contains the PDF file: ./scdataservices_3_0/components/SunCluster_Data_Service_Answer_Book_3.0/Docs/locale/C/PDF.

Similar to the way the Sun Cluster AnswerBooks are delivered, six PDF files are being delivered on the Sun Cluster CD-ROM, and one PDF file is being delivered on the Data Services CD-ROM. Each PDF file is named with an abbreviation for the book the file contains.

Table 1-2, "Mapping of PDF Abbreviations to Book Titles," shows the mapping of PDF file name abbreviations to the book titles.

Table 1-2 Mapping of PDF Abbreviations to Book Titles

CD-ROM 

PDF Abbreviation 

Book Title 

Sun Cluster 

CLUSTINSTALL

Sun Cluster 3.0 Installation Guide

CLUSTNETHW

Sun Cluster 3.0 Hardware Guide

CLUSTAPIPG

Sun Cluster 3.0 Data Services Developers' Guide

CLUSTSYSADMIN

Sun Cluster 3.0 System Administration Guide

CLUSTCONCEPTS

Sun Cluster 3.0 Concepts

CLUSTERRMSG

Sun Cluster 3.0 Error Messages Manual

Data Services 

CLUSTDATASVC

Sun Cluster 3.0 Data Services Installation and Configuration Guide

Sun Cluster 3.0 Restrictions

The following restrictions apply to the Sun Cluster 3.0 release:

Supported Solaris Release and Patch Information

Access the SunSolve web pages at http://sunsolve.ebay.sun.com for the list of supported versions of the Solaris operating environment and required patches for Sun Cluster 3.0. Locate the Sun Cluster pages by doing a simple search specifying the EarlyNotifier collection and the search criteria "Sun Cluster 3.0."

Review the EarlyNotifier information before installing Sun Cluster 3.0 and before you apply any patch to a cluster component (Solaris operating environment, Sun Cluster, volume manager, or disk firmware). All cluster member nodes must be at the same patch level for proper cluster operation.

Refer to the Sun Cluster 3.0 System Administration Guide for specific patch procedures and tips on administering patches.

System Administration and Procedure Updates

This section describes changes and updates to procedures used to administer a cluster.

syncdir Option Changes

In the Beta releases, you were required to specify the syncdir option when adding a cluster file system in /etc/vfstab. The GA release does not require this specification. Refer to the Sun Cluster 3.0 Installation Guide or the Sun Cluster 3.0 Concepts document for more information about this change.

Private Hostnames

Do not use the scsetup utility to change private hostnames after you have configured and started data services. Even though the scsetup utility permits you to change private hostnames, do not attempt to do so without contacting your Sun service representative.

Known Problems

The following known problems affect the operation of the Sun Cluster 3.0 GA release. The most current information on known problems can be accessed through the online Release Notes at http://suncluster.eng.sun.com.

Bug ID 4314698

Problem Summary: After installing Solstice Disksuite software, the scgdevs(1M) command must be run for the Solstice Disksuite device links to appear in the global namespace.

Workaround: Run the scgdevs command manually to make sure that Solstice Disksuite device nodes are created.

Bug ID 4346123

Problem Summary: When booting a cluster node after multiple failures, a cluster file system might fail to mount automatically from its /etc/vfstab entry, and the boot process will place the node in an administrative shell. Running the fsck command on the device might yield the following error.


# fsck -y /dev/global/rdsk/d1s7
** /dev/global/rdsk/d1s7
Can't roll the log for /dev/global/rdsk/d1s7

Workaround: This problem might occur when the global device is associated with a stale cluster file system mount. Run the following command, and check if the file system shows up in an error state to confirm a stale mount.


# /usr/bin/df -k

If the global device is associated with a stale cluster file system mount, unmount the global device. Note that if any users of the file system exist on any of the cluster nodes, the unmount cannot succeed. Run the following command on each node to identify current users of the file system.


# /usr/sbin/fuser -c mountpoint

Also, run the share(1M) command to confirm that the file system is not NFS shared on any of the cluster nodes.

Bug ID 4358349

Problem Summary: Do not create Sun Cluster HA for NFS resources in a resource group that contains a SharedAddress resource. Sun Cluster software does not support the use of SharedAddress resources with that data service.

Workaround: Add the desired logical hostname resources into the failover resource group.

You must set up a LogicalHostname resource with this step. The hostname used with Sun Cluster HA for NFS cannot be a SharedAddress resource.


# scrgadm -a -L -g resource-group-name -l hostname,...
-a -L -g resource-group-name

Specifies the failover resource group into which to place the logical hostname resources.

-l hostname,...

Specifies the network resources (logical hostnames) to be added.

Bug ID 4358629

Problem Summary: Upgrades from Sun Cluster 2.2 to Sun Cluster 3.0 software might fail if the logical hosts created for the Sun Cluster 2.2 software use a number for the IP address rather than a hostname.

Workaround: The two ways to solve this problem are:

Bug ID 4359321

Problem Summary: The scinstall utility enables you to specify the /global directory for the global devices file system. However, because the mount point for the global devices file system is /global/.devices/node@nodeid, this specification should not be enabled.

Workaround: Re-install the node using the correct name for the global devices file system.

Although not preferred, fixing the entries in the /etc/vfstab files, rebooting the cluster, and then running the scgdevs command is a possible workaround. Check that each /global/.devices/node@nodeid entry in each/etc/vfstab file has the global mount option set.

Bug ID 4362435

Problem Summary: When the Sun Cluster 3.0 module is loaded into the Sun Management Center 2.1 console, and you try to access Resource Type Definition->Properties Table, if the table is more than one page long, it never loads.

Workaround: Run the scrgadm -pvv command to see all resource type properties.

Bug ID 4362925

Problem Summary:


nodeA# scshutdown -g0 -y
scshutdown: Unmount of /dev/md/sc/dsk/d30 failed: Device busy.
scshutdown: Could not unmount all PxFS filesystems.

The Networker packages were bundled and installed during the Oracle installation. Therefore, the nsrmmd daemon is running and mounting to the /global/oracle directory, which prevents the unmount of all cluster file systems.


nodeA# umount /global/oracle
umount: global/oracle busy
nodeA# fuser -c /global/oracle
/global/oracle: nodeA# umount /global/oracle
umount: global/oracle busy
nodeA# fuser -c /global/oracle
/global/oracle: 335co 317co 302co 273co 272co
nodeA# ps -ef|grep 335
 root 335 273 0 17:17:41 ?       0:00 /usr/sbin/nsrmmd -n 1
 root 448 397 0 17:19:37 console 0:00 grep 335

This problem occurs during Sun Cluster shutdown when the shutdown tries to unmount a cluster file system that the process nsrmmd is still referencing.

Workaround: Run the fuser(1M) command on each node to establish a list of all processes still using the cluster file systems that cannot be unmounted. Check that no Resource Group Manager resources have been restarted since the failed scshutdown(1M) command was first run. Kill all these processes with the kill -9 command. This kill list should not include any processes under the control of the Resource Group Manager. After all such processes have terminated, rerun the scshutdown command, and the shutdown should run to successful completion.

Bug ID 4365310

Problem Summary: If a resource state becomes STOP_FAILED, you must manually clear the STOP_FAILED flag for the resource. If you specify multiple resources to have their flags cleared, and one of the resources is not in the STOP_FAILED state, the function returns early without clearing the STOP_FAILED flags of the other resources listed.

No error message is displayed in this case, but the flags for the other resources are not cleared. The lack of an error message is misleading, giving no indication that a failure occurred, while the STOP_FAILED state is not cleared for all of the resources listed in the command.

Workaround: To avoid this problem, clear the STOP_FAILED flags individually for each resource that are in STOP_FAILED state.


# scswitch -c -f STOP_FAILED -j stopfailres -h phys-schost-1

Bug ID 4365700

Problem Summary: In the following example, multiple resources are disabled from the same resource group with a single command.


# scswitch -n -j r1,r2,r3

If the first resource moves into the STOP_FAILED state, the remaining resources might end up disabled but still online. This online state represents an invalid internal state of the Resource Group Manager daemon and can cause the Resource Group Manager daemon to panic.

Workaround: When disabling resources, always disable just one resource per scswitch(1M) command.

Bug ID 4365729

Problem Summary: Attempts to put a device group into maintenance mode using the following command fail if file systems are mounted on the specified device group.


# scswitch -m -D device-group

Workaround: Unmount all file systems on the device group to be put into maintenance. A device group can be placed in maintenance mode only if the devices in that device group are unused, meaning no active users of devices exist in that device group, and all dependent filesystems are unmounted.

Bug ID 4366840

Problem Summary: If any cables and associated adapters or junctions are removed from a cluster while one of its nodes is down, that node will panic when it is rebooted and attempts to rejoin the cluster.

Workaround: Until this bug is fixed, do not remove cables, adapters, or junctions from a cluster while a node is down. If you do experience this panic, reboot the node a second time. The node can then join the cluster without panicking.

Bug ID 4366886

Problem Summary: Heavy system load might interfere with bringing device groups online. This problem occurs because VERITAS Volume Manager (VxVM) needs to perform several tasks, such as syncing mirrors, to import a disk group. Under heavy load, these tasks can be prevented from completing in a timely manner because other system tasks are utilizing important system resources. Because device groups are commonly brought online automatically when a node boots (if a file system is set to automatically mount, for example), such an online hang might manifest itself as a hang during boot.

Workaround: Decrease system load or increase the priority of the vxconfigd daemon.

Bug ID 4368034

Problem Summary: If the Resource Group Manager daemon dies, or a node dies while a remote procedure call is in progress, error messages such as one of the following might be printed on the system console.


COMM_FAILURE SystemException: COMM_FAILURE major 3 minor 0 Error 0 completed NO

INV_OBJREF SystemException: INV_OBJREF major 4 minor 9 Bad file number completed NO

These messages are intended for debugging use rather than customer consumption. The Resource Group Manager daemon already writes clearer syslog messages for these exceptions, so the debugging printf's are unnecessary.

Workaround: Ignore these console messages. Look for syslog messages regarding a node death. Normally, the Resource Group Manager daemon recovers automatically from such an event.

Bug ID 4369228

Problem Summary: The dbassist utility provided by Oracle does not enable creation of an Oracle Parallel Server database directly on a hardware RAID device.

Workaround: Use the Oracle Server Manager line mode, svrgmrl, to create Oracle Parallel Server databases on Sun Cluster 3.0 software.

Bug ID 4369565

Problem Summary: The nfs_upgrade script is not idempotent. You cannot run the script twice.

Workaround: If you need to run the script twice, remove the NFS resource and NFS resource type that were created in the first attempt before running the script a second time.

Bug ID 4369668

Problem Summary: When the system administrator edits the Nodelist property of a managed resource group, the Resource Group Manager should run the INIT method on all resources in the resource group that have the property Init_nodes=RG_PRIMARIES, on all nodes that have been added to the node list. The Resource Group Manager should run the FINI method on such resources, on nodes that were deleted from the node list.Similarly, if the Installed_nodes property of a resource type is edited, the Resource Group Manager should run the INIT or FINI method on all resources of that type that reside in managed resource groups and have the property Init_nodes=RT_installed_nodes.

Currently, the Resource Group Manager does not run INIT or FINI methods when these updates are performed. As a result, the resources might not be properly initialized or cleaned up on these nodes.

Workaround: Using the scswitch command, unmanage and then re-manage the affected resource groups. Unfortunately, this process requires that the administrator take the resource group offline. Alternatively, the administrator can run the equivalent INIT or FINI actions manually (without unmanaging the resource group), if such procedures are documented for the resource types that occur within the group.

This workaround is unnecessary if none of the resources in the group have INIT or FINI methods. The only Sun-supplied resource types that use INIT and FINI methods are:

Resource types that customers or third parties implement might also use INIT or FINI methods. If so, this workaround is necessary for resource groups that contain such resource types.


Note -

All scalable services implicitly use INIT and FINI methods, even if such methods are not explicitly declared for the resource type.


Bug ID 4370760

Problem Summary: You cannot remove the last host from a metaset unless you first take the device group offline.

Workaround: To remove the last host from a metaset, take the device group offline first. To remove the last host, run the following two commands as superuser from the host to be removed.


# /usr/cluster/bin/scswitch -m -D disksetname
# metaset -s disksetname -d -h hostname

Bug ID 4371236

Problem Summary: Some ge switches require some of the ge device parameters to be set to values other than the default values. Chapter 3 of the Sun GigabitEthernet/P 2.0 Adapter Installation and User's Guide describes the procedure to change ge device parameters. The procedure to be used on nodes running Sun Cluster 3.0 software varies slightly from that described in the guide. In particular, the difference is in how the device path names in the /etc/path_to_inst file are used to derive parent names for use in the ge.conf file.

Workaround: Chapter 3 of the Sun GigabitEthernet/P 2.0 Adapter Installation and User's Guide describes the procedure to change ge device parameter values through entries in the /kernel/drv/ge.conf file. The procedure to determine the parent name from the /etc/path_to_inst listing (to be used in ge.conf entries) appears on page 24, "Setting Driver Parameters Using a ge.conf File." For example, from the following /etc/path_to_inst line, you can determine the parent name for ge2 to be /pci@4,4000.


"/pci@4,4000/network@4" 2 "ge"

On cluster nodes, you must delete the /node@nodeid prefix in the device paths in /etc/path_to_inst prior to using that prefix as a parent name. For example, on a cluster node, an equivalent /etc/path_to_inst entry could have been the following entry.


"/node@1/pci@4,4000/network@4" 2 "ge"

The parent name for ge2 to use in ge.conf is still /pci@4,4000.

Bug ID 4372369

Problem Summary: The nfs_upgrade script cannot work if more than one logical host is configured in Sun Cluster 2.2 software.

Workaround: No current workaround exists. If you encounter this problem, contact your Sun Service provider about acquiring a patch.

Bug ID 4373498

Problem Summary: The LDAP administrative server treats hostnames as case sensitive. While working with the LDAP administrative server, therefore, all hostnames specified in the LDAP configuration should match case with the LDAP specification in the name service in use on the cluster node. This case matching is particularly important if DNS is the name service in use because the DNS domain name must also match exactly with the hostname specification in the LDAP configuration.

Workaround: Make sure the case of the fully qualified domain name of the machine given to LDAP matches the case of the domain name returned by the resolver.

Bug ID 4373911

Problem Summary: If you do the following:

the HA-NFS fault monitor might display the following warning message.


clnt_tp_create_timed of program statd failed:RPC:Program not registered

Workaround: No workaround is necessary. The warning message can safely be ignored.

Bug ID 4374194

Problem Summary: The Sun Management Center agent might unexpectedly exit on UltraTM 2 workstations with Sun StorEdge A5000. The problem occurs when the Sun Management Center agent is set up with Config Reader, and the Config-Reader4udt module is added to the /var/opt/SUNWsymon/cfg/base-modules-d.dat file. The Sun Management Center agent reads this file on startup and then tries to load all listed modules. The agent might segmentation fault while trying to load the Config-Reader4udt module.

Workaround: To avoid this problem, do one of the following:

Bug ID 4374648

Problem Summary: The scinstall man page currently has an example that uses -s oracle to automatically upgrade a Sun Cluster HA for Oracle data service from Sun Cluster 2.2 to Sun Cluster 3.0 software. This option is currently unsupported.

Workaround: Do not use the -s oracle option to attempt to upgrade from Sun Cluster 2.2 to Sun Cluster 3.0 software for an Oracle data service. Instead, use the manual upgrade procedure, "Upgrading Sun Cluster HA for Oracle from Sun Cluster 2.2 to Sun Cluster 3.0 Software".

Bug ID 4376171

Problem Summary: Placing a FC-AL SBus Card (FC100/S) and a Sun Quad FastEthernetTM 2.0 (SQFE/S) on the same SBus might cause unexpected resets on the QFE card.

Workaround: Avoid configuring cluster nodes with a FC-AL SBus Card (FC100/S) and a Sun Quad FastEthernet 2.0 (SQFE/S) on the same SBus.

Bug ID 4377303

Problem Summary: Newly created Sun StorEdge A3500 LUNs might not always appear in format on all nodes.

Workaround: Run the /etc/raid/bin/hot_add command on nodes that do not see the new LUNs.

Bug ID 4378553

Problem Summary: A resource group's Nodelist property is an ordered list of nodes that can master the resource group, with the most-preferred node listed first. The Resource Group Manager should always host a resource group on the most-preferred node that is available. However, when an administrator reboots the cluster (when all nodes are rebooting at once), managed resource groups might end up being mastered on nodes other than the most-preferred node. This problem occurs only upon reboot of the entire cluster.

Workaround: After rebooting the cluster, use the scswitch command to switch resource groups onto the desired nodes. The Nodelist preference order will be enforced automatically from that point onward, as long as the cluster remains up.

Scalable Services Sticky Load-Balancing Policy

Currently, you might encounter a problem if you run a scalable data service that uses the sticky load-balancing policy. The problem can occur if the service runs with stickiness established relative to a particular node, and later you start another instance of the same service on a different node. Starting another instance of the same service might cause the first instance to lose its stickiness.

The result that the sticky algorithm returns when the second instance starts determines whether the first instance loses its stickiness. The algorithm should not change the sticky affinity in this case, but sometimes the algorithm does change the sticky affinity.

Refer to Sun Cluster 3.0 Concepts for more information on the sticky load-balancing policy.

Upgrading Sun Cluster HA for Oracle from Sun Cluster 2.2 to Sun Cluster 3.0 Software

Perform these procedures while upgrading the Sun Cluster framework using the scinstall upgrade procedure.

Conditions and Restrictions

The following conditions and restrictions apply when upgrading Sun Cluster HA for Oracle from Sun Cluster 2.2 to Sun Cluster 3.0 software.

How to Save the Sun Cluster HA for Oracle Configuration Files

Use the following procedure to save the configuration files from your Sun Cluster 2.2 configuration.

  1. Follow the scinstall framework-upgrade procedure until you have completed the upgrade-begin steps (scinstall -F begin) on each node.

  2. Run the following command on each node as superuser. This command will save a version of all files in the /var/opt/oracle directory.

    To ensure that this information does not get lost, back up the structure found in the /var/opt/oracle directory to an external device.


    # cp -r /var/opt/oracle /var/cluster/logs/install/preserve/2.2/SUNWscor
    
  3. Complete the finish portion of the framework upgrade (scinstall -u finish).


    Note -

    Do not use the -s oracle option with the scinstall -u finish command. This option attempts an automated upgrade for Sun Cluster HA for Oracle, and the automated upgrade will fail. The only automated upgrade supported is for NFS.


After completing the framework upgrade, set up the Sun Cluster 3.0 environment. The following section, "Setting Up the Sun Cluster 3.0 Environment", describes this procedure.

Setting Up the Sun Cluster 3.0 Environment

Perform the following steps to set up your Sun Cluster 3.0 environment.

  1. On one node, run the following command to verify that:

    • The framework upgrade has correctly set up a Sun Cluster 3.0 resource group that corresponds to each Sun Cluster 2.2 logical host.

    • The hostname network resource is in the resource group and is online.


    # scstat -g
    
  2. On one node, run the following command to verify that the VERITAS disk group or Solstice DiskSuite diskset that held the Oracle database (and possibly the Oracle binaries) in Sun Cluster 2.2 is correctly mapped into a Sun Cluster 3.0 disk device group.


    # scstat -D
    
  3. On each node, run the following command to verify that the required file systems for each Oracle instance are mounted.


    # mount
    
  4. On each node, run the following commands to restore the saved version of the Oracle configuration files under the /var/opt directory.

    If you saved the files in the /var/opt/oracle directory earlier in the procedure, and the files are unchanged, you can skip this step.


    # cp -r /var/cluster/logs/install/preserve/2.2/SUNWscor/oracle /var/opt
    # chown -R oracle:dba /var/opt/oracle
    

Configure Sun Cluster HA for Oracle Under Sun Cluster 3.0

Configure Sun Cluster 3.0 HA for Oracle using the following procedure.


Note -

Perform Step 1 only once.


  1. On one node, register the Oracle server and listener resource types using the following commands.


    # scrgadm -a -t SUNW.oracle_server
    # scrgadm -a -t SUNW.oracle_listener
    

    Run Step 2 through Step 5 for each Sun Cluster 2.2 HA for Oracle instance listed in the /var/opt/oracle/oratab file.

  2. Determine the value of the ORACLE_HOME variable from the oratab file.

    For example, suppose the oratab file shows the following information.


    ora32:/oracle/816_32:N

    This information indicates that the ORACLE_HOME variable for the ORACLE_SID ora32 instance is the value /oracle/816_32.

  3. Retrieve the parameter values from the ccd.database file for each Oracle instance.

    These parameters will map into Sun Cluster 3.0 parameters to scrgadm. You will use these parameters when configuring Sun Cluster HA for Oracle under Sun Cluster 3.0.


    # grep ^HAORACLE: /var/cluster/logs/install/preserve/2.2/SUNWcluster/conf/ccd.database
    

    Each Oracle instance in the ccd.database file uses the following format


    HAORACLE:on:ora32:boots-1:60:10:120:300:scott/tiger:/oracle/816_32/dbs/initora32.ora:ORA_LIST
    .

    These parameters map into the following Sun Cluster 3.0 format.


    HAORACLE:STATE:ORACLE_SID:LOGICAL_HOSTNAME_IP_Resource:THOROUGH_PROBE_INTERVAL:CONNECT_CYCLE:PROBE_TIMEOUT:RETRY_INTERVAL:CONNECT_STRING:PARAMETER_FILE:LISTENER_NAME

    The resource group name RG_NAME will be ${LOGICAL_HOSTNAME_IP_Resource}-lh. Note that the -lh will be automatically appended to the resource group name in Sun Cluster 3.0.

  4. Locate the background_dump_dest value in the $PARAMETER_FILE variable, and set the ALERT_LOG_FILE variable to the following value.


    $background_dump_dest/alert_$ORACLE_SID.log

    For example, for ORACLE_SID=ora32, suppose that in the $PARAMETER_FILE file, background_dump_dest is the following value.


    /oracle/816_32/admin/ora32/bdump

    In this example, ALERT_LOG_FILE should be updated to the following value.


    /oracle/816_32/admin/ora32/bdump/alert_ora32.log
    

  5. On one node, run the following commands to create Oracle resources and bring them online.


    # scrgadm -a -t SUNW.oracle_server -g $RG_NAME -j $ORACLE_SID-serv \ 
    
    -x Oracle_sid=$ORACLE_SID -x Oracle_home=$ORACLE_HOME \ 
    
    -y Thorough_probe_interval=$THOROUGH_PROBE_INTERVAL \ 
    
    -x Connect_cycle=$CONNECT_CYCLE -x Probe_timeout=$PROBE_TIMEOUT \ 
    
    -y Retry_interval=$RETRY_INTERVAL -x Connect_string=$CONNECT_STRING \ 
    
    -x Parameter_file=$PARAMETER_FILE -x Alert_log_file=$ALERT_LOG_FILE
    # scrgadm -a -j $ORACLE_SID-list -t SUNW.oracle_listener -g $RG_name \ 
    
    -x Oracle_home=$ORACLE_HOME -x Listener_name=$LISTENER_NAME# scswitch -e -j $ORACLE_SID-serv
    # scswitch -e -j $ORACLE_SID-list
    # scswitch -e -M -j $ORACLE_SID-serv
    # scswitch -e -M -j $ORACLE_SID-list
    

    For example, using the Oracle instance described in Step 2, Step 3, and Step 4, you would run the following commands.


    # scrgadm -a -t SUNW.oracle_server -g boots-1-lh -j ora32-serv \ 
    
    -x Oracle_sid=ora32 -x Oracle_home=/oracle/816_32 \ 
    
    -y Thorough_probe_interval=60 \ 
    
    -x Connect_cycle=10 -x Probe_timeout=120 \ 
    
    -y Retry_interval=300 -x Connect_string=scott/tiger \ 
    
    -x Parameter_file=/oracle/816_32/dbs/initora32.ora \ 
    
    -x Alert_log_file=/oracle/816_32/admin/ora32/bdump/alert_ora32.log
    # scrgadm -a -j ora32-list -t SUNW.oracle_listener -g boots-1-lh \ 
    
    -x Oracle_home=/oracle/816_32 -x Listener_name=ORA_LIST
    # scswitch -e -j ora32-serv
    # scswitch -e -j ora32-list
    # scswitch -e -M -j ora32-serv
    # scswitch -e -M -j ora32-list
    

Verify the Upgrade

To verify that the upgrade has completed successfully, perform the following steps.

  1. Verify that the Oracle resources are online by using the following command


    # scstat -g
    
    .

  2. Verify that you can switch over the resource group by using the following command.


    # scswitch -z -g resource-group -h node
    

Known Documentation Problems

This section discusses documentation errors you might encounter and steps to correct these problems.

Installation Guide

The Sun Cluster 3.0 Installation Guide contains the following documentation errors:

Hardware Guide

In the Sun Cluster 3.0 Hardware Guide, the following procedures are incorrect or do not exist:

How to Move a Disk Cable to a New Adapter

Use the following procedure to move a disk cable to a new adapter within a node.

  1. Quiesce all I/O to the affected disk(s).

  2. Unplug the cable from the old adapter.

  3. Run the cfgadm(1M) command on the local node to unconfigure all drives affected by the move.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  4. Run the devfsadm -C command on the local node to clean up the Solaris device link.

  5. Run the scdidadm -C command on the local node to clean up the DID device path.

  6. Connect the cable to the new adapter.

  7. Run the cfgadm command on the local node to configure the drives in the new location.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  8. Run the scgdevs command to add the new DID device path.

How to Move a Disk Cable From One Node to Another

Use the following procedure to move a disk cable from one node to another node.

  1. Delete all references to the path you wish to remove from all volume manager and data service configurations.

  2. Quiesce all I/O to the affected disk(s).

  3. Unplug the cable from the old node.

  4. Run the cfgadm command on the old node to unconfigure all drives affected by the move.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  5. Run the devfsadm -C command on the old node to clean up the Solaris device link.

  6. Run the scdidadm -C command on the old node to clean up the DID device path.

  7. Connect the cable to the new node.

  8. Run the cfgadm command on the new node to configure the drives in the new location.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  9. Run the devfsadm command on the new node to create the new Solaris device links.

  10. Run the scgdevs command on the new node to add the new DID device path.

  11. Add the path on the new node to any required volume manager and data service configurations.

    When configuring data services, check that your node failover preferences are set to reflect the new configuration.

How to Update Cluster Software to Reflect Proper Device Configuration

If the preceding procedures are not followed correctly, an error might be logged the next time you run the scdidadm -r command or the scgdevs command. To update the cluster software to reflect the proper device configuration, perform the following steps.

  1. Make sure cable configuration is as you want it to be. Make sure the cable is detached from the old node.

  2. Make sure the old node is removed from any required volume manager or data service configurations.

  3. Run the cfgadm command on the old node to unconfigure all drives affected by the move.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  4. Run the devfsadm -C command on the node from where you removed the cable.

  5. Run the scdidadm -C command on the node from where you removed the cable.

  6. Run the cfgadm command on the new node to configure the drives in the new location.

    Or, reboot the node by using the following command.


    # reboot -- -r
    
  7. Run the scgdevs command on the new node to add the new DID device path.

  8. Run the scdidadm -R device command on the new node to make sure that SCSI reservations are in the correct state.

Data Services Developers' Guide

The sample code in Appendix B of the Sun Cluster 3.0 Data Services Developers' Guide has two known problems:

Concepts Guide

The following points should be noted about Sun Cluster 3.0 Concepts:

Using the Cluster Interconnect for Application Traffic

A cluster must have multiple network connections between nodes, forming the cluster interconnect. The clustering software uses multiple interconnects both for high availability and to improve performance. For internal traffic (for example, file system data or scalable services data), messages are striped across all available interconnects in a round-robin fashion.

The cluster interconnect is also available to applications, for highly available communication between nodes. For example, a distributed application might have components running on different nodes that need to communicate. By using the cluster interconnect rather than the public interconnect, these connections can withstand the failure of an individual link.

To use the cluster interconnect for communication between nodes, an application must use the private hostnames configured when the cluster was installed. For example, if the private hostname for node 1 is clusternode1-priv, use that name to communicate over the cluster interconnect to node 1. TCP sockets opened using this name are routed over the cluster interconnect and can be transparently re-routed in the event of network failure.

Note that because the private hostnames can be configured during installation, the cluster interconnect can use any name chosen at that time. The actual name can be obtained from scha_cluster_get(3HA) with the scha_privatelink_hostname_node argument.

For application-level use of the cluster interconnect, a single interconnect is used between each pair of nodes, but separate interconnects are used for different node pairs if possible. For example, consider an application running on three nodes and communicating over the cluster interconnect. Communication between nodes 1 and 2 might take place on interface hme0, while communication between nodes 1 and 3 might take place on interface qfe1. That is, application communication between any two nodes is limited to a single interconnect, while internal clustering communication is striped over all interconnects.

Note that the application shares the interconnect with internal clustering traffic, so the bandwidth available to the application depends on the bandwidth used for other clustering traffic. In the event of a failure, internal traffic can round-robin over the remaining interconnects, while application connections on a failed interconnect can switch to a working interconnect.

Two types of addresses support the cluster interconnect, and gethostbyname(3N) on a private hostname normally returns two IP addresses. The first address is called the logical pairwise address, and the second address is called the logical pernode address.

A separate logical pairwise address is assigned to each pair of nodes. This small logical network supports failover of connections. Each node is also assigned a fixed pernode address. That is, the logical pairwise addresses for clusternode1-priv are different on each node, while the logical pernode address for clusternode1-priv is the same on each node. A node does not have a pairwise address to itself, however, so gethostbyname(clusternode1-priv) on node 1 returns only the logical pernode address.

Note that applications accepting connections over the cluster interconnect and then verifying the IP address for security reasons must check against all IP addresses returned from gethostbyname, not just the first IP address.

If you need consistent IP addresses in your application at all points, configure the application to bind to the pernode address on both the client and the server side so that all connections can appear to come and go from the pernode address.

Data Services Installation and Configuration Guide

Chapter 5, "Installing and Configuring Sun Cluster HA for Apache," in the Sun Cluster 3.0 Data Services Installation and Configuration Guide describes the procedure for installing the Apache Web Server from the Apache web site (http://www.apache.org). However, you can also install the Apache Web Server from the Solaris 8 operating environment CD-ROM.

The Apache binaries are included in three packages--SUNWapchr, SUNWapchu, and SUNWapchd--that form the SUNWCapache package metacluster. You must install SUNWapchr before SUNWapchu.

Place the Web server binaries on the local file system on each of your cluster nodes or on a cluster file system.

Installing Apache from the Solaris 8 CD-ROM

This procedure documents the steps required to use the Sun Cluster HA for Apache data service with the version of the Apache Web Server that is on the Solaris 8 operating environment CD-ROM.

  1. Install the Apache packages SUNWapchr, SUNWapchu, and SUNWapchd if they are not already installed.

    Use pkginfo(1) to determine if the packages are already installed.


    # pkgadd -d Solaris 8 Product directory SUNWapchr SUNWapchu SUNWapchd
    ...
    Installing Apache Web Server (root) as SUNWapchr
    ...
    [ verifying class initd ]
    /etc/rc0.d/K16apache linked pathname
    /etc/rc1.d/K16apache linked pathname
    /etc/rc2.d/K16apache linked pathname
    /etc/rc3.d/S50apache linked pathname
    /etc/rcS.d/K16apache linked pathname
    ...
  2. Disable the start and stop run control scripts that were just installed as part of the SUNWapchr package.

    Disabling these scripts is necessary because the Sun Cluster HA for Apache data service will start and stop the Apache application after the data service has been configured. Perform the following steps:

    1. List the Apache run control scripts.

    2. Rename the Apache run control scripts.

    3. Verify that all the Apache-related scripts have been renamed.


    Note -

    The following example changes the first letter in the name of the run control script from upper case to lower case. You can rename the scripts, however, in a fashion consistent with your normal administration practices.



    # ls -1 /etc/rc?.d/*apache
    /etc/rc0.d/K16apache
    /etc/rc1.d/K16apache
    /etc/rc2.d/K16apache
    /etc/rc3.d/S50apache
    /etc/rcS.d/K16apache
    
    # mv /etc/rc0.d/K16apache  /etc/rc0.d/k16apache# mv /etc/rc1.d/K16apache  /etc/rc1.d/k16apache
    
    # mv /etc/rc2.d/K16apache  /etc/rc2.d/k16apache
    
    # mv /etc/rc3.d/S50apache  /etc/rc3.d/s50apache
    
    # mv /etc/rcS.d/K16apache  /etc/rcS.d/k16apache
    
    
    
    # ls -1 /etc/rc?.d/*apache
    /etc/rc0.d/k16apache/etc/rc1.d/k16apache/etc/rc2.d/k16apache/etc/rc3.d/s50apache/etc/rcS.d/k16apache

Man Pages

New man pages are included for each data service supplied with Sun Cluster 3.0 software. The data service man pages include: SUNW.apache(5), SUNW.dns(5), SUNW.iws(5), SUNW.nfs(5), SUNW.nsldap(5), SUNW.oracle_listener(5), SUNW.oracle_server(5), SUNW.HAStorage(5) and scalable_service(5). These man pages describe the standard and extension properties that these data services use.

Known Problems With the Sun Management Center GUI

This section describes known problems with the Sun Cluster 3.0 module of the Sun Management Center GUI.

Certain Types of Ultra Servers are Not Recognized by Sun Management Center

Symptoms

Confirmation of Problem/Start of Workaround

  1. Close the Details Window.

  2. From the Sun Management Center Window, choose File->Console Messages.

  3. Double-click the folder icon representing the unrecognized cluster node.

  4. Look in the console messages window for a line reading ...family definition file missing for...

Workaround

  1. On the Sun Management Center server, change to the directory holding family files.


    # cd /opt/SUNWsymon/classes/base/console/cfg
    

  2. Create a symbolic link to the closest available family-j.x file.

    For example, if the missing file line read ...missing for sun4u-Sun-Ultra-450-family-j.x..., create a link from sun4u-Sun-Enterprise-450-family-j.x to sun4u-Sun-Ultra-450-family-j.x.


    # ln -s sun4u-Sun-Enterprise-450-family-j.x sun4u-Sun-Ultra-450-family-j.x
    
  3. Exit the console, and restart it.

Alternate Method for Determining Names to Symbolic Link

  1. Double-click the unrecognized cluster node to bring up its Details Window.

  2. Click the Info tab.

  3. Search for the Entity Family entry in the Properties table.

    The value will probably be truncated, so let the mouse pointer linger over the value field. The complete name (for example, sun4u-Sun-Ultra-450) appears in the tooltip.

  4. Append -family-j.x to determine the link name to create.