JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster System Administration Guide     Oracle Solaris Cluster 4.1
search filter icon
search icon

Document Information

Preface

1.  Introduction to Administering Oracle Solaris Cluster

2.  Oracle Solaris Cluster and RBAC

3.  Shutting Down and Booting a Cluster

4.  Data Replication Approaches

5.  Administering Global Devices, Disk-Path Monitoring, and Cluster File Systems

6.  Administering Quorum

7.  Administering Cluster Interconnects and Public Networks

8.  Adding and Removing a Node

9.  Administering the Cluster

Overview of Administering the Cluster

How to Change the Cluster Name

How to Map Node ID to Node Name

How to Work With New Cluster Node Authentication

How to Reset the Time of Day in a Cluster

SPARC: How to Display the OpenBoot PROM (OBP) on a Node

How to Change the Node Private Hostname

How to Rename a Node

How to Change the Logical Hostnames Used by Existing Oracle Solaris Cluster Logical Hostname Resources

How to Put a Node Into Maintenance State

How to Bring a Node Out of Maintenance State

How to Uninstall Oracle Solaris Cluster Software From a Cluster Node

Troubleshooting a Node Uninstallation

Unremoved Cluster File System Entries

Unremoved Listing in Device Groups

Creating, Setting Up, and Managing the Oracle Solaris Cluster SNMP Event MIB

How to Enable an SNMP Event MIB

How to Disable an SNMP Event MIB

How to Change an SNMP Event MIB

How to Enable an SNMP Host to Receive SNMP Traps on a Node

How to Disable an SNMP Host From Receiving SNMP Traps on a Node

How to Add an SNMP User on a Node

How to Remove an SNMP User From a Node

Configuring Load Limits

How to Configure Load Limits on a Node

Changing Port Numbers for Services or Management Agents

How to Use the Common Agent Container to Change the Port Numbers for Services or Management Agents

Performing Zone Cluster Administrative Tasks

How to Add a Network Address to a Zone Cluster

How to Remove a Zone Cluster

How to Remove a File System From a Zone Cluster

How to Remove a Storage Device From a Zone Cluster

Troubleshooting

Running an Application Outside the Global Cluster

How to Take a Solaris Volume Manager Metaset From Nodes Booted in Noncluster Mode

Restoring a Corrupted Diskset

How to Save the Solaris Volume Manager Software Configuration

How to Purge the Corrupted Diskset

How to Recreate the Solaris Volume Manager Software Configuration

10.  Configuring Control of CPU Usage

11.  Updating Your Software

12.  Backing Up and Restoring a Cluster

A.  Example

Index

Troubleshooting

This section contains troubleshooting procedures that you can use for testing purposes.

Running an Application Outside the Global Cluster

How to Take a Solaris Volume Manager Metaset From Nodes Booted in Noncluster Mode

Use this procedure to run an application outside the global cluster for testing purposes.

  1. Determine if the quorum device is used in the Solaris Volume Manager metaset, and determine if the quorum device uses SCSI2 or SCSI3 reservations.
    phys-schost# clquorum show
    1. If the quorum device is in the Solaris Volume Manager metaset, add a new quorum device which is not part of the metaset to be taken later in noncluster mode.
      phys-schost# clquorum add did
    2. Remove the old quorum device.
      phys-schost# clqorum remove did
    3. If the quorum device uses a SCSI2 reservation, scrub the SCSI2 reservation from the old quorum and verify that no SCSI2 reservations remain.

      The following command finds the Persistent Group Reservation Emulation (PGRE) keys. If there are no keys on the disk, an errno=22 message is displayed.

      # /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/dids2

      After you locate the keys, scrub the PGRE keys.

      # /usr/cluster/lib/sc/pgre -c pgre_scrub -d /dev/did/rdsk/dids2

      Caution

      Caution - If you scrub the active quorum device keys from the disk, the cluster will panic on the next reconfiguration with a Lost operational quorum message.


  2. Evacuate the global-cluster node that you want to boot in noncluster mode.
    phys-schost# clresourcegroup evacuate -n targetnode
  3. Take offline any resource group or resource groups that contain HAStorage or HAStoragePlus resources and contain devices or file systems affected by the metaset that you want to later take in noncluster mode.
    phys-schost# clresourcegroup offline resourcegroupname
  4. Disable all the resources in the resource groups that you took offline.
    phys-schost# clresource disable resourcename
  5. Unmanage the resource groups.
    phys-schost# clresourcegroup unmanage resourcegroupname
  6. Take offline the corresponding device group or device groups.
    phys-schost# cldevicegroup offline devicegroupname
  7. Disable the device group or device groups.
    phys-schost# cldevicegroup disable devicegroupname
  8. Boot the passive node into noncluster mode.
    phys-schost# reboot -x
  9. Verify that the boot process has been completed on the passive node before proceeding.
    phys-schost# svcs -x
  10. Determine if any SCSI3 reservations exist on the disks in the metasets.

    Run the following command on all disks in the metasets.

    phys-schost# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/dids2
  11. If any SCSI3 reservations exist on the disks, scrub them.
    phys-schost# /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/dids2
  12. Take the metaset on the evacuated node.
    phys-schost# metaset -s name -C take -f
  13. Mount the file system or file systems that contain the defined device on the metaset.
    phys-schost# mount device mountpoint
  14. Start the application and perform the desired test. After finishing the test, stop the application.
  15. Reboot the node and wait until the boot process has ended.
    phys-schost# reboot
  16. Bring online the device group or device groups.
    phys-schost# cldevicegroup online -e devicegroupname
  17. Start the resource group or resource groups.
    phys-schost# clresourcegroup online -eM resourcegroupname 

Restoring a Corrupted Diskset

Use this procedure if a diskset is corrupted or in a state that the nodes in the cluster are unable to take ownership of the diskset. If your attempts to clear the state have failed, use this procedure as a last attempt to fix the diskset.

These procedures work for Solaris Volume Manager metasets and multi-owner Solaris Volume Manager metasets.

How to Save the Solaris Volume Manager Software Configuration

Restoring a disk set from scratch can be time-consuming and error prone. A better alternative is to use the metastat command to regularly back up replicas or use Oracle Explorer (SUNWexplo) to create a backup. You can then use the saved configuration to recreate the diskset. You should save the current configuration into files (using the prtvtoc and the metastat commands), and then recreate the disk set and its components. See How to Recreate the Solaris Volume Manager Software Configuration.

  1. Save the partition table for each disk in the disk set.
    # /usr/sbin/prtvtoc /dev/global/rdsk/diskname > /etc/lvm/diskname.vtoc
  2. Save the Solaris Volume Manager software configuration.
    # /bin/cp /etc/lvm/md.tab /etc/lvm/md.tab_ORIGINAL
    # /usr/sbin/metastat -p -s setname >> /etc/lvm/md.tab

    Note - Other configuration files, such as the /etc/vfstab file, might reference the Solaris Volume Manager software. This procedure assumes that an identical Solaris Volume Manager software configuration is rebuilt and therefore, the mount information is the same. If Oracle Explorer (SUNWexplo) is run on a node that owns the set, it retrieves the prtvtoc and metaset —p information.


How to Purge the Corrupted Diskset

Purging a set from a node or all nodes removes the configuration. To purge a diskset from a node, the node must not have ownership of the diskset.

  1. Run the purge command on all nodes.
    # /usr/sbin/metaset -s setname -P

    Running this command removes the diskset information from the database replicas, as well as the Oracle Solaris Cluster repository. The -P and -C options allow a diskset to be purged without the need to completely rebuild the Solaris Volume Manager environment.


    Note - If a multi-owner diskset is purged while the nodes were booted out of cluster mode, you might need to remove the information from the dcs configuration files.

    # /usr/cluster/lib/sc/dcs_config -c remove -s setname

    For more information, see the dcs_config(1M) man page.


  2. If you want to remove only the diskset information from the database replicas, use the following command.
    # /usr/sbin/metaset -s setname -C purge

    You should generally use the -P option, rather than the -C option. Using the -C option can cause a problem recreating the diskset because the Oracle Solaris Cluster software still recognizes the diskset.

    1. If you used the -C option with the metaset command, first create the diskset to see if a problem occurs.
    2. If a problem exists, remove the information from the dcs configuration files.
      # /usr/cluster/lib/sc/dcs_config -c remove -s setname

      If the purge options fail, verify that you installed the latest kernel and metadevice updates and contact My Oracle Support.

How to Recreate the Solaris Volume Manager Software Configuration

Use this procedure only if you experience a complete loss of your Solaris Volume Manager software configuration. The steps assume that you have saved your current Solaris Volume Manager configuration and its components and purged the corrupted diskset.


Note - Mediators should be used only on two-node clusters.


  1. Create a new diskset.
    # /usr/sbin/metaset -s setname -a -h nodename1 nodename2

    If this is a multi-owner diskset, use the following command to create a new diskset.

     /usr/sbin/metaset -s setname -aM -h nodename1 nodename2
  2. On the same host where the set was created, add mediator hosts if required (two nodes only).
     /usr/sbin/metaset -s setname -a -m nodename1 nodename2
  3. Add the same disks back into the diskset from this same host.
     /usr/sbin/metaset -s setname -a /dev/did/rdsk/diskname /dev/did/rdsk/diskname
  4. If you purged the diskset and are recreating it, the Volume Table of Contents (VTOC) should remain on the disks, so you can skip this step.

    However, if you are recreating a set to recover, you should format the disks according to a saved configuration in the /etc/lvm/diskname.vtoc file. For example:

    # /usr/sbin/fmthard -s /etc/lvm/d4.vtoc /dev/global/rdsk/d4s2
    # /usr/sbin/fmthard -s /etc/lvm/d8.vtoc /dev/global/rdsk/d8s2

    You can run this command on any node.

  5. Check the syntax in the existing /etc/lvm/md.tab file for each metadevice.
    # /usr/sbin/metainit -s setname -n -a metadevice
  6. Create each metadevice from a saved configuration.
    # /usr/sbin/metainit -s setname -a metadevice
  7. If a filesystem exists on the metadevice, run the fsck command.
    # /usr/sbin/fsck -n /dev/md/setname/rdsk/metadevice

    If the fsck command displays only a few errors, such as superblock count, then the device was probably reconstructed correctly. You can then run the fsck command without the -n option. If multiple errors appear, verify that you reconstructed the metadevice correctly. If you have, review the fsck errors to determine if the filesystem can be recovered. If it cannot, you should restore the data from a backup.

  8. Concatenate all other metasets on all cluster nodes to the /etc/lvm/md.tab file and then concatenate the local diskset.
    # /usr/sbin/metastat -p >> /etc/lvm/md.tab