JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster Data Services Planning and Administration Guide     Oracle Solaris Cluster 4.1
search filter icon
search icon

Document Information

Preface

1.  Planning for Oracle Solaris Cluster Data Services

2.  Administering Data Service Resources

Overview of Tasks for Administering Data Service Resources

Configuring and Administering Oracle Solaris Cluster Data Services

Registering a Resource Type

How to Register a Resource Type

Upgrading a Resource Type

How to Install and Register an Upgrade of a Resource Type

How to Migrate Existing Resources to a New Version of the Resource Type

How to Unregister Older Unused Versions of the Resource Type

Downgrading a Resource Type

How to Downgrade a Resource to an Older Version of Its Resource Type

Creating a Resource Group

How to Create a Failover Resource Group

How to Create a Scalable Resource Group

Configuring Failover and Scalable Data Services on Shared File Systems

How to Configure a Failover Application Using the ScalMountPoint Resource

How to Configure a Scalable Application Using the ScalMountPoint Resource

Tools for Adding Resources to Resource Groups

How to Add a Logical Hostname Resource to a Resource Group by Using the clsetup Utility

How to Add a Logical Hostname Resource to a Resource Group Using the Command-Line Interface

How to Add a Shared Address Resource to a Resource Group by Using the clsetup Utility

How to Add a Shared Address Resource to a Resource Group Using the Command-Line Interface

How to Add a Failover Application Resource to a Resource Group

How to Add a Scalable Application Resource to a Resource Group

Bringing Resource Groups Online

How to Bring Resource Groups Online

Switching Resource Groups to Preferred Primaries

How to Switch Resource Groups to Preferred Primaries

Enabling a Resource

How to Enable a Resource

Quiescing Resource Groups

How to Quiesce a Resource Group

How to Quiesce a Resource Group Immediately

Suspending and Resuming the Automatic Recovery Actions of Resource Groups

Immediately Suspending Automatic Recovery by Killing Methods

How to Suspend the Automatic Recovery Actions of a Resource Group

How to Suspend the Automatic Recovery Actions of a Resource Group Immediately

How to Resume the Automatic Recovery Actions of a Resource Group

Disabling and Enabling Resource Monitors

How to Disable a Resource Fault Monitor

How to Enable a Resource Fault Monitor

Removing Resource Types

How to Remove a Resource Type

Removing Resource Groups

How to Remove a Resource Group

Removing Resources

How to Remove a Resource

Switching the Current Primary of a Resource Group

How to Switch the Current Primary of a Resource Group

Disabling Resources and Moving Their Resource Group Into the UNMANAGED State

How to Disable a Resource and Move Its Resource Group Into the UNMANAGED State

Displaying Resource Type, Resource Group, and Resource Configuration Information

Changing Resource Type, Resource Group, and Resource Properties

How to Change Resource Type Properties

How to Change Resource Group Properties

How to Change Resource Properties

How to Change Resource Dependency Properties

How to Modify a Logical Hostname Resource or a Shared Address Resource

Clearing the STOP_FAILED Error Flag on Resources

How to Clear the STOP_FAILED Error Flag on Resources

Clearing the Start_failed Resource State

How to Clear a Start_failed Resource State by Switching Over a Resource Group

How to Clear a Start_failed Resource State by Restarting a Resource Group

How to Clear a Start_failed Resource State by Disabling and Enabling a Resource

Upgrading a Preregistered Resource Type

Information for Registering the New Resource Type Version

Information for Migrating Existing Instances of the Resource Type

Reregistering Preregistered Resource Types After Inadvertent Deletion

How to Reregister Preregistered Resource Types After Inadvertent Deletion

Adding or Removing a Node to or From a Resource Group

Adding a Node to a Resource Group

How to Add a Node to a Scalable Resource Group

How to Add a Node to a Failover Resource Group

Removing a Node From a Resource Group

How to Remove a Node From a Scalable Resource Group

How to Remove a Node From a Failover Resource Group

How to Remove a Node From a Failover Resource Group That Contains Shared Address Resources

Example - Removing a Node From a Resource Group

Synchronizing the Startups Between Resource Groups and Device Groups

Managed Entity Monitoring by HAStoragePlus

Troubleshooting Monitoring for Managed Entities

Additional Administrative Tasks to Configure HAStoragePlus Resources for a Zone Cluster

How to Set Up the HAStoragePlus Resource Type for New Resources

How to Set Up the HAStoragePlus Resource Type for Existing Resources

Configuring an HAStoragePlus Resource for Cluster File Systems

Sample Entries in /etc/vfstab for Cluster File Systems

How to Set Up the HAStoragePlus Resource for Cluster File Systems

How to Delete an HAStoragePlus Resource Type for Cluster File Systems

Enabling Highly Available Local File Systems

Configuration Requirements for Highly Available Local File Systems

Format of Device Names for Devices Without a Volume Manager

Sample Entries in /etc/vfstab for Highly Available Local File Systems

How to Set Up the HAStoragePlus Resource Type by Using the clsetup Utility

How to Set Up the HAStoragePlus Resource Type to Make File Systems Highly Available Other Than Solaris ZFS

How to Set Up the HAStoragePlus Resource Type to Make a Local Solaris ZFS File System Highly Available

How to Delete an HAStoragePlus Resource That Makes a Local Solaris ZFS Highly Available

Sharing a Highly Available Local File System Across Zone Clusters

Configuration Requirements for Sharing a Highly Available Local File System Directory to a Zone Cluster

How to Set Up the HAStorage Plus Resource Type to Share a Highly Available Local File System Directory to a Zone Cluster

Modifying Online the Resource for a Highly Available Local File System

How to Add File Systems Other Than Solaris ZFS to an Online HAStoragePlus Resource

How to Remove File Systems Other Than Solaris ZFS From an Online HAStoragePlus Resource

How to Add a Solaris ZFS Storage Pool to an Online HAStoragePlus Resource

How to Remove a Solaris ZFS Storage Pool From an Online HAStoragePlus Resource

Changing a ZFS Pool Configuration That is Managed by an HAStoragePlus Resource

How to Change a ZFS Pool Configuration That is Managed by an HAStoragePlus Resource in an Offline State

How to Change a ZFS Pool Configuration That is Managed by an Online HAStoragePlus Resource

How to Recover From a Fault After Modifying the FileSystemMountPoints Property of an HAStoragePlus Resource

How to Recover From a Fault After Modifying the Zpools Property of an HAStoragePlus Resource

Changing the Cluster File System to a Local File System in an HAStoragePlus Resource

How to Change the Cluster File System to Local File System in an HAStoragePlus Resource

Upgrading the HAStoragePlus Resource Type

Information for Registering the New Resource Type Version

Information for Migrating Existing Instances of the Resource Type

Distributing Online Resource Groups Among Cluster Nodes

Resource Group Affinities

Enforcing Collocation of a Resource Group With Another Resource Group

Specifying a Preferred Collocation of a Resource Group With Another Resource Group

Distributing a Set of Resource Groups Evenly Among Cluster Nodes

Specifying That a Critical Service Has Precedence

Delegating the Failover or Switchover of a Resource Group

Combining Affinities Between Resource Groups

Zone Cluster Resource Group Affinities

Configuring the Distribution of Resource Group Load Across Nodes

How to Configure Load Limits for a Node

How to Set Priority for a Resource Group

How to Set Load Factors for a Resource Group

How to Set Preemption Mode for a Resource Group

How to Concentrate Load Onto Fewer Nodes in the Cluster

Enabling Oracle Solaris SMF Services to Run With Oracle Solaris Cluster

Encapsulating an SMF Service Into a Failover Proxy Resource Configuration

Encapsulating an SMF Service Into a Multi-Master Proxy Resource Configuration

Encapsulating an SMF Service Into a Scalable Proxy Resource Configuration

Tuning Fault Monitors for Oracle Solaris Cluster Data Services

Setting the Interval Between Fault Monitor Probes

Setting the Timeout for Fault Monitor Probes

Defining the Criteria for Persistent Faults

Complete Failures and Partial Failures of a Resource

Dependencies of the Threshold and the Retry Interval on Other Properties

System Properties for Setting the Threshold and the Retry Interval

Specifying the Failover Behavior of a Resource

Index

Synchronizing the Startups Between Resource Groups and Device Groups

After a cluster boots or services fail over to another node, global devices and local and cluster file systems might require time to become available. However, a data service can run its START method before global devices and local and cluster file systems come online. If the data service depends on global devices or local and cluster file systems that are not yet online, the START method times out. In this situation, you must reset the state of the resource groups that the data service uses and restart the data service manually.

To avoid these additional administrative tasks, use the HAStoragePlus resource type. Add an instance of HAStoragePlus to all resource groups whose data service resources depend on global devices or local and cluster file systems. Instances of these resource types perform the following operations:

If an application resource is configured on top of an HAStoragePlus resource, the application resource must define the offline restart dependency on the underlying HAStoragePlus resource. This ensures that the application resource comes online after the dependent HAStoragePlus resource comes online, and goes offline before the HAStoragePlus resource goes offline.

The following command creates an offline restart dependency from an application resource to a HASP resource:

# clrs set -p Resource_dependencies_offline_restart=hasp_rs applicaton_rs

To create an HAStoragePlus resource, see How to Set Up the HAStoragePlus Resource Type for New Resources.

Managed Entity Monitoring by HAStoragePlus

All entities that are managed by the HAStoragePlus resource type are monitored. The SUNWHAStoragePlus resource type provides a fault monitor to monitor the health of the entities managed by the HASP resource, including global devices, file systems, and ZFS storage pools. The fault monitor runs fault probes on a regular basis. If one of the entities becomes unavailable, the resource is restarted or a failover to another node is performed. If more than one entity is monitored, the fault monitor probes them all at the same time. Ensure that all configuration changes to the managed entities are completed before you enable monitoring.


Note - Version 9 of the HAStoragePlus resource fault monitor probes the devices and file systems it manages by reading and writing to the file systems. If a read operation is blocked by any software on the I/O stack and the HAStoragePlus resource is required to be online, the user must disable the fault monitor. For example, you must unmonitor the HAStoragePlus resource managing the Availability Suite Remote Replication volumes because Availability Suite from Oracle blocks reading from any bitmap volume or any data volume in the NEED SYNC state. The HAStoragePlus resource managing the Availability Suite volumes must be online at all times.


For more information on the properties that enable monitoring for managed entities, see the SUNW.HAStoragePlus(5) man page.

For instructions on enabling and disabling monitoring for managed entities, see How to Enable a Resource Fault Monitor.

Depending on the type of managed entity, the fault monitor probes the target by reading or writing to it. If more than one entity is monitored, the fault monitor probes them all at the same time.

Table 2-2 What the Fault Monitor Verifies

Monitored Entity
What the Fault Monitor Verifies
Global device
  • The device group is online or degraded.
  • The device is readable.

Raw device group
  • The device group is online or degraded.
  • For each device of the device group, its path (/dev/global/rdsk/device) is available.

  • Partitions of every device are readable.

Solaris Volume Manager device group
  • The device group is online or degraded.
  • The path of the metaset (/dev/md/metaset) is valid.

  • The Solaris Volume Manager reported status from the primary of the device group:

    • The unmirrored metadevice is not in any of the following error states: Needs Maintenance, Last Erred, or Unavailable.

    • At least one submirror of a mirror is not in an error state. An error with some, but not all submirrors, is treated as partial error.

  • The unmirrored metadevice is readable from the primary.

  • Some submirrors of a mirror are readable. An error with some, but not all, submirrors is treated as partial error.

File systems (including UFS and PxFS)
  • The file system is mounted.
  • Every device under the file system is readable.

  • The file system is readable, if the IOOption property is set to ReadOnly.

  • The file system is writable, if the IOOption property is set to ReadWrite.

  • If the file system is mounted read-only but the IOOption property is set to ReadWrite, the fault monitor issues a warning and then tries to read it (rather than write to it).

  • To avoid having the HAStoragePlus resource go offline when a file system hits its quota, set the IOOption to ReadOnly. The ReadOnly option ensures that the fault monitor will not attempt to write to the file system.

ZFS storage pool
  • The pool status is OK or Degraded.
  • Each non-legacy file system is mounted.

  • Each non-legacy file system is readable, if the IOOption property is set to ReadOnly.

  • Each non-legacy file system is writable, if the IOOption property is set to ReadWrite.

  • If a non-legacy file system is mounted read-only but the IOOption property is set to ReadWrite, the fault monitor issues a warning and then tries to read it (rather than write to it).

  • To avoid having the HAStoragePlus resource go offline when a file system hits its quota, set the IOOption to ReadOnly. The ReadOnly option ensures that the fault monitor will not attempt to write to the file system.


Note - When all connections to a top-level ZFS storage device are lost, queries about the ZFS storage pool or associated file system will hang. To prevent the fault monitor from hanging, you must set the fail_mode property of the ZFS storage pool to panic.


For instructions on enabling a resource fault monitor, see How to Enable a Resource Fault Monitor.

Troubleshooting Monitoring for Managed Entities

If monitoring is not enabled on the managed entities, perform the following troubleshooting steps:

  1. Ensure that the hastorageplus_probe process is running.

  2. Look for error messages on the console.

  3. Enable debug messages to the syslog file.

    # mkdir -p /var/cluster/rgm/rt/SUNW.HAStoragePlus:9
    # echo 9 > /var/cluster/rgm/rt/SUNW.HAStoragePlus:9/loglevel

    You should also check the /etc/syslog.conf file to ensure that messages with the daemon.debug facility level are logged to the /var/adm/messages file. Add the daemon.debug entry to the /var/adm/messages action if it is not already present.

Additional Administrative Tasks to Configure HAStoragePlus Resources for a Zone Cluster

When you configure HAStoragePlus resources for a zone cluster, you need to perform the following additional tasks before performing the steps for global cluster:

How to Set Up the HAStoragePlus Resource Type for New Resources

In the following example, the resource group resource-group-1 contains the following data services.


Note - To create an HAStoragePlus resource with Oracle Solaris ZFS as a highly available local file system seeHow to Set Up the HAStoragePlus Resource Type to Make a Local Solaris ZFS File System Highly Available section.


To create an HAStoragePlus resource hastorageplus-1 for new resources in resource-group-1, read Synchronizing the Startups Between Resource Groups and Device Groups and then perform the following steps.

To create an HAStoragePlus resource, see Enabling Highly Available Local File Systems.

  1. On a cluster member, assume the root role that provides solaris.cluster.modify and solaris.cluster.admin RBAC authorizations.
  2. Create the resource group resource-group-1.
    # clresourcegroup create resource-group-1
  3. Determine whether the resource type is registered.

    The following command prints a list of registered resource types.

    # clresourcetype show | egrep Type
  4. If you need to, register the resource type.
    # clresourcetype register SUNW.HAStoragePlus
  5. Create the HAStoragePlus resource hastorageplus-1, and define the filesystem mount points and global device paths.
    # clresource create -g resource-group-1 -t SUNW.HAStoragePlus \
    -p GlobalDevicePaths=/dev/global/dsk/d5s2,dsk/d6 \
    -p FilesystemMountPoints=/global/resource-group-1 hastorageplus-1

    GlobalDevicePaths can contain the following values.

    • Global device group names, such as nfs-dg, dsk/d5

    • Paths to global devices, such as /dev/global/dsk/d1s2, /dev/md/nfsdg/dsk/d10

    FilesystemMountPoints can contain the following values.

    • Mount points of local or cluster file systems, such as /local-fs/nfs, /global/nfs


    Note - HAStoragePlus has a Zpools extension property that is used to configure ZFS file system storage pools and a ZpoolsSearchDir extension property that is used to specify the location to search for the devices of ZFS file system storage pools. The default value for the ZpoolsSearchDir extension property is /dev/dsk. The ZpoolsSearchDir extension property is similar to the -d option of the zpool(1M) command.


    The resource is created in the enabled state.

  6. Add the resources (Oracle iPlanet Web Server (formerly Sun Java System Web Server), Oracle, and NFS) to resource-group-1, and set their dependency to hastorageplus-1.

    For example, for Oracle iPlanet Web Server (formerly Sun Java System Web Server), run the following command.

    # clresource create  -g resource-group-1 -t SUNW.iws \
    -p Confdir_list=/global/iws/schost-1 -p Scalable=False \
    -p Resource_dependencies=schost-1 -p Port_list=80/tcp \
    -p Resource_dependencies_offline_restart=hastorageplus-1 resource

    The resource is created in the enabled state.

  7. Verify that you have correctly configured the resource dependencies.
    # clresource show -v resource | egrep Resource_dependencies_offline_restart
  8. Set resource-group-1 to the MANAGED state, and bring resource-group-1 online.
    # clresourcegroup online -M resource-group-1
Affinity Switchovers

The HAStoragePlus resource type contains another extension property, AffinityOn, which is a Boolean that specifies whether HAStoragePlus must perform an affinity switchover for the global devices that are defined in GLobalDevicePaths and FileSystemMountPoints extension properties. For details, see the SUNW.HAStoragePlus(5) man page.


Note - The setting of the AffinityOn flag is ignored for scalable services. Affinity switchovers are not possible with scalable resource groups.


How to Set Up the HAStoragePlus Resource Type for Existing Resources

Before You Begin

Read Synchronizing the Startups Between Resource Groups and Device Groups.

  1. Determine whether the resource type is registered.

    The following command prints a list of registered resource types.

    # clresourcetype show | egrep Type
  2. If you need to, register the resource type.
    # clresourcetype register SUNW.HAStoragePlus
  3. Create the HAStoragePlus resource hastorageplus-1.
    # clresource create -g resource-group \
    -t SUNW.HAStoragePlus -p GlobalDevicePaths= … \
    -p FileSystemMountPoints=... -p AffinityOn=True hastorageplus-1

    The resource is created in the enabled state.

  4. Set up the dependency for each of the existing resources, as required.
    # clresource set -p Resource_Dependencies_offline_restart=hastorageplus-1 resource
  5. Verify that you have correctly configured the resource dependencies.
    # clresource show -v resource | egrep Resource_dependencies_offline_restart