Oracle® Solaris Cluster System Administration Guide

Exit Print View

Updated: October 2015
 
 

How to Validate a Basic Cluster Configuration

The cluster command uses the check subcommand to validate the basic configuration that is required for a global cluster to function properly. If no checks fail, cluster check returns to the shell prompt. If a check fails, cluster check produces reports in either the specified or the default output directory. If you run cluster check against more than one node, cluster check produces a report for each node and a report for multinode checks. You can also use the cluster list-checks command to display a list of all available cluster checks.

In addition to basic checks, which run without user interaction, the command can also run interactive checks and functional checks. Basic checks are run when the –k keyword option is not specified.

  • Interactive checks require information from the user that the checks cannot determine. The check prompts the user for the needed information, for example, the firmware version number. Use the –k interactive keyword to specify one or more interactive checks.

  • Functional checks exercise a specific function or behavior of the cluster. The check prompts for user input, such as which node to fail over to, as well as confirmation to begin or continue the check. Use the –k functional check-id keyword to specify a functional check. Perform only one functional check at a time.


    Note -  Because some functional checks involve interrupting cluster service, do not start any functional check until you have read the detailed description of the check and determined whether you need to first take the cluster out of production. To display this information, use the following command:
    % cluster list-checks -v -C checkID

You can run the cluster check command in verbose mode with the –v flag to display progress information.


Note -  Run cluster check after performing an administration procedure that might result in changes to devices, volume management components, or the Oracle Solaris Cluster configuration.

Running the clzonecluster(1CL) command from the global-cluster node runs a set of checks to validate the configuration that is required for a zone cluster to function properly. If all checks pass, clzonecluster verify returns to the shell prompt and you can safely install the zone cluster. If a check fails, clzonecluster verify reports on the global-cluster nodes where the verification failed. If you run clzonecluster verify against more than one node, a report is produced for each node and a report for multinode checks. The verify subcommand is not allowed inside a zone cluster.

  1. Assume the root role on an active member node of a global cluster.
    phys-schost# su

    Perform all steps of this procedure from a node of the global cluster.

  2. Ensure that you have the most current checks.
    1. Go to the Patches & Updates tab of My Oracle Support.
    2. In the Advanced Search, select Solaris Cluster as the Product and type check in the Description field.

      The search locates Oracle Solaris Cluster software updates that contain checks.

    3. Apply any software updates that are not already installed on your cluster.
  3. Run basic validation checks.
    phys-schost# cluster check -v -o outputdir
    –v

    Verbose mode

    –o outputdir

    Redirects output to the outputdir subdirectory.

    This command runs all available basic checks. No cluster functionality is affected.

  4. Run interactive validation checks.
    phys-schost# cluster check -v -k interactive -o outputdir
    –k interactive

    Specifies running interactive validation checks.

    The command runs all available interactive checks and prompts you for needed information about the cluster. No cluster functionality is affected.

  5. Run functional validation checks.
    1. List all available functional checks in nonverbose mode.
      phys-schost# cluster list-checks -k functional
    2. Determine which functional checks perform actions that would interfere with cluster availability or services in a production environment.

      For example, a functional check might trigger a node panic or a failover to another node.

      phys-schost# cluster list-checks -v -C check-ID
      –C check-ID

      Specifies a specific check.

    3. If the functional check that you want to perform might interrupt cluster functioning, ensure that the cluster is not in production.
    4. Start the functional check.
      phys-schost# cluster check -v -k functional -C check-ID -o outputdir
      –k functional

      Specifies running functional validation checks.

      Respond to prompts from the check to confirm that the check should run, and for any information or actions you must perform.

    5. Repeat Step c and Step d for each remaining functional check to run.

      Note -  For record-keeping purposes, specify a unique outputdir subdirectory name for each check you run. If you reuse an outputdir name, output for the new check overwrites the existing contents of the reused outputdir subdirectory.
  6. If you have a zone cluster configured, verify the configuration of the zone cluster to see if a zone cluster can be installed.
    phys-schost# clzonecluster verify zoneclustername
  7. Make a recording of the cluster configuration for future diagnostic purposes.

    See How to Record Diagnostic Data of the Cluster Configuration in Oracle Solaris Cluster Software Installation Guide .

Example 1-7  Checking the Global Cluster Configuration With All Basic Checks Passing

The following example shows cluster check run in verbose mode against nodes phys-schost-1 and phys-schost-2 with all checks passing.

phys-schost# cluster check -v -h phys-schost-1, phys-schost-2

cluster check: Requesting explorer data and node report from phys-schost-1.
cluster check: Requesting explorer data and node report from phys-schost-2.
cluster check: phys-schost-1: Explorer finished.
cluster check: phys-schost-1: Starting single-node checks.
cluster check: phys-schost-1: Single-node checks finished.
cluster check: phys-schost-2: Explorer finished.
cluster check: phys-schost-2: Starting single-node checks.
cluster check: phys-schost-2: Single-node checks finished.
cluster check: Starting multi-node checks.
cluster check: Multi-node checks finished
Example 1-8  Listing Interactive Validation Checks

The following example lists all interactive checks that are available to run on the cluster. Example output shows a sampling of possible checks; actual available checks vary for each configuration.

# cluster list-checks -k interactive
 Some checks might take a few moments to run (use -v to see progress)...
 I6994574  :  (Moderate)  Fix for GLDv3 interfaces on cluster transport vulnerability applied?
Example 1-9  Running a Functional Validation Check

The following example first shows the verbose listing of functional checks. The verbose description is then listed for the check F6968101, which indicates that the check would disrupt cluster services. The cluster is taken out of production. The functional check is then run with verbose output logged to the funct.test.F6968101.12Jan2011 subdirectory. Example output shows a sampling of possible checks; actual available checks vary for each configuration.

# cluster list-checks -k functional
 F6968101  :   (Critical)   Perform resource group switchover
 F6984120  :   (Critical)   Induce cluster transport network failure - single adapter.
 F6984121  :   (Critical)   Perform cluster shutdown
 F6984140  :   (Critical)   Induce node panic
# cluster list-checks -v -C F6968101
 F6968101: (Critical) Perform resource group switchover
Keywords: SolarisCluster3.x, functional
Applicability: Applicable if multi-node cluster running live.
Check Logic: Select a resource group and destination node. Perform 
'/usr/cluster/bin/clresourcegroup switch' on specified resource group 
either to specified node or to all nodes in succession.
Version: 1.2
Revision Date: 12/10/10 

Take the cluster out of production

# cluster list-checks -k functional -C F6968101 -o funct.test.F6968101.12Jan2011
F6968101 
  initializing...
  initializing xml output...
  loading auxiliary data...
  starting check run...
     pschost1, pschost2, pschost3, pschost4:     F6968101.... starting:  
Perform resource group switchover           


  ============================================================

   >>> Functional Check 

    'Functional' checks exercise cluster behavior. It is recommended that you
    do not run this check on a cluster in production mode.' It is recommended
    that you have access to the system console for each cluster node and
    observe any output on the consoles while the check is executed.

    If the node running this check is brought down during execution the check
    must be rerun from this same node after it is rebooted into the cluster in
    order for the check to be completed.

    Select 'continue' for more details on this check.

          1) continue
          2) exit

          choice: l

  ============================================================

   >>> Check Description <<<

Follow onscreen directions
Example 1-10  Checking the Global Cluster Configuration With a Failed Check

The following example shows the node phys-schost-2 in the cluster named suncluster minus the mount point /global/phys-schost-1. Reports are created in the output directory /var/cluster/logs/cluster_check/<timestamp>.

phys-schost# cluster check -v -h phys-schost-1, 
phys-schost-2 -o  /var/cluster/logs/cluster_check/Dec5/

cluster check: Requesting explorer data and node report from phys-schost-1.
cluster check: Requesting explorer data and node report from phys-schost-2.
cluster check: phys-schost-1: Explorer finished.
cluster check: phys-schost-1: Starting single-node checks.
cluster check: phys-schost-1: Single-node checks finished.
cluster check: phys-schost-2: Explorer finished.
cluster check: phys-schost-2: Starting single-node checks.
cluster check: phys-schost-2: Single-node checks finished.
cluster check: Starting multi-node checks.
cluster check: Multi-node checks finished.
cluster check: One or more checks failed.
cluster check: The greatest severity of all check failures was 3 (HIGH).
cluster check: Reports are in /var/cluster/logs/cluster_check/<Dec5>.
#
# cat /var/cluster/logs/cluster_check/Dec5/cluster_check-results.suncluster.txt
...
===================================================
= ANALYSIS DETAILS =
===================================================
------------------------------------
CHECK ID : 3065
SEVERITY : HIGH
FAILURE  : Global filesystem /etc/vfstab entries are not consistent across
all Oracle Solaris  Cluster 4.x nodes.
ANALYSIS : The global filesystem /etc/vfstab entries are not consistent across
all nodes in this cluster.
Analysis indicates:
FileSystem '/global/phys-schost-1' is on 'phys-schost-1' but missing from 'phys-schost-2'.
RECOMMEND: Ensure each node has the correct /etc/vfstab entry for the
filesystem(s) in question.
...
#