Sun Cluster System Administration Guide for Solaris OS

Administering Disk-Path Monitoring

Disk path monitoring (DPM) administration commands enable you to receive notification of secondary disk-path failure. Use the procedures in this section to perform administrative tasks that are associated with monitoring disk paths. Refer to Chapter 3, Key Concepts for System Administrators and Application Developers, in Sun Cluster Concepts Guide for Solaris OS for conceptual information about the disk-path monitoring daemon. Refer to the cldevice(1CL) man page for a description of the scdpm command options and related commands. Refer to the syslogd(1M) man page for logged errors that the daemon reports.

Note –

Disk paths are automatically added to the monitoring list monitored when I/O devices are added to a node by using the cldevice command. Disk paths are also automatically unmonitored when devices are removed from a node by using Sun Cluster commands.

Table 5–6 Task Map: Administering Disk-Path Monitoring


Task	Instructions
Monitor a disk path.	How to Monitor a Disk Path
Unmonitor a disk path.	How to Unmonitor a Disk Path
Print the status of faulted disk paths for a node.	How to Print Failed Disk Paths
Monitor disk paths from a file.	How to Monitor Disk Paths From a File
Enable or disable the automatic rebooting of a node when all monitored disk paths fail.	How to Enable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail How to Disable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail
Resolve an incorrect disk-path status. An incorrect disk-path status can be reported when the monitored DID device is unavailable at boot time, and the DID instance is not uploaded to the DID driver.	How to Resolve a Disk-Path Status Error

The procedures in the following section that issue the cldevice command include the disk-path argument. The disk-path argument consists of a node name and a disk name. The node name is not required and defaults to all if you do not specify it.

How to Monitor a Disk Path

Perform this task to monitor disk paths in your cluster.

Caution –

DPM is not supported on nodes that run versions that were released prior to Sun Cluster 3.1 10/03 software. Do not use DPM commands while a rolling upgrade is in progress. After all nodes are upgraded, the nodes must be online to use DPM commands.

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix B, Sun Cluster Object-Oriented Commands.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.

Monitor a disk path.
# cldevice monitor -n node disk

Verify that the disk path is monitored.
# cldevice status device

Example 5–44 Monitoring a Disk Path on a Single Node

The following example monitors the schost-1:/dev/did/rdsk/d1 disk path from a single node. Only the DPM daemon on the node schost-1 monitors the path to the disk /dev/did/dsk/d1 .

# cldevice monitor -n schost-1 /dev/did/dsk/d1
# cldevice status d1

Device Instance   Node           Status
--------------- ---- ------
/dev/did/rdsk/d1   phys-schost-1 Ok

Example 5–45 Monitoring a Disk Path on All Nodes

The following example monitors the schost-1:/dev/did/dsk/d1 disk path from all nodes. DPM starts on all nodes for which /dev/did/dsk/d1 is a valid path.

# cldevice monitor /dev/did/dsk/d1
# cldevice status /dev/did/dsk/d1

Device Instance   Node           Status
--------------- ---- ------
/dev/did/rdsk/d1   phys-schost-1 Ok

Example 5–46 Rereading the Disk Configuration From the CCR

The following example forces the daemon to reread the disk configuration from the CCR and prints the monitored disk paths with status.

# cldevice monitor +
# cldevice status
Device Instance              Node               Status
---------------              ----               ------
/dev/did/rdsk/d1             schost-1           Ok
/dev/did/rdsk/d2             schost-1           Ok
/dev/did/rdsk/d3             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d4             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d5             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d6             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d7             schost-2           Ok
/dev/did/rdsk/d8             schost-2           Ok

How to Unmonitor a Disk Path

Use this procedure to unmonitor a disk path.

Caution –

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.

Determine the state of the disk path to unmonitor.
# cldevice status device

On each node, unmonitor the appropriate disk paths.
# cldevice unmonitor -n node disk

Example 5–47 Unmonitoring a Disk Path

The following example unmonitors the schost-2:/dev/did/rdsk/d1 disk path and prints disk paths with status for the entire cluster.

# cldevice unmonitor -n schost2 /dev/did/rdsk/d1
# cldevice status -n schost2 /dev/did/rdsk/d1

Device Instance              Node               Status
---------------              ----               ------
/dev/did/rdsk/d1             schost-2           Unmonitored

How to Print Failed Disk Paths

Use the following procedure to print the faulted disk paths for a cluster.

Caution –

Become superuser on any node in the cluster.

Print the faulted disk paths throughout the cluster.
# cldevice status -s fail

Example 5–48 Printing Faulted Disk Paths

The following example prints faulted disk paths for the entire cluster.

# cldevice status -s fail
     
Device Instance               Node              Status
---------------               ----              ------
dev/did/dsk/d4                phys-schost-1     fail

How to Resolve a Disk-Path Status Error

If the following events occur, DPM might not update the status of a failed path when it comes back online:

A monitored-path failure causes a node reboot.
The device under the monitored DID path does not come back online until after the rebooted node is back online.

The incorrect disk-path status is reported because the monitored DID device is unavailable at boot time, and therefore the DID instance is not uploaded to the DID driver. When this situation occurs, manually update the DID information.

From one node, update the global devices namespace.
# cldevice populate

On each node, verify that command processing has completed before you proceed to the next step.

The command executes remotely on all nodes, even though the command is run from just one node. To determine whether the command has completed processing, run the following command on each node of the cluster.
# ps -ef | grep scgdevs

Verify that, within the DPM polling time frame, the status of the faulted disk path is now Ok.

# cldevice status disk-device

Device Instance               Node                  Status
---------------               ----                  ------
dev/did/dsk/dN                phys-schost-1         Ok

How to Monitor Disk Paths From a File

Use the following procedure to monitor or unmonitor disk paths from a file.

To change your cluster configuration by using a file, you must first export the current configuration. This export operation creates an XML file that you can then modify to set the configuration items you are changing. The instructions in this procedure describe this entire process.

Caution –

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.

Export your device configuration to an XML file.
# cldevice export -o configurationfile
-o configurationfile

Specify the file name for your XML file.

Modify the configuration file so that device paths are monitored.

Find the device paths that you want to monitor, and set the monitored attribute to true.

Monitor the device paths.
# cldevice monitor -i configurationfile
-i configurationfile

Specify the file name of the modified XML file.

Verify that device path is now monitored.
# cldevice status

Example 5–49 Monitor Disk Paths From a File

In the following example, the device path between the node phys-schost–2 and device d3 is monitored by using an XML file.

The first step is to export the current cluster configuration.

# cldevice export -o deviceconfig

The deviceconfig XML file shows that the path between phys-schost–2 and d3 is not currently monitored.

<?xml version="1.0"?>
<!DOCTYPE cluster SYSTEM "/usr/cluster/lib/xml/cluster.dtd">
<cluster name="brave_clus">
.
.
.
   <deviceList readonly="true">
    <device name="d3" ctd="c1t8d0">
      <devicePath nodeRef="phys-schost-1" monitored="true"/>
      <devicePath nodeRef="phys-schost-2" monitored="false"/>
    </device>
  </deviceList>
</cluster>

To monitor that path, set the monitored attribute to true, as follows.

<?xml version="1.0"?>
<!DOCTYPE cluster SYSTEM "/usr/cluster/lib/xml/cluster.dtd">
<cluster name="brave_clus">
.
.
.
   <deviceList readonly="true">
    <device name="d3" ctd="c1t8d0">
      <devicePath nodeRef="phys-schost-1" monitored="true"/>
      <devicePath nodeRef="phys-schost-2" monitored="true"/>
    </device>
  </deviceList>
</cluster>

Use the cldevice command to read the file and turn on monitoring.

# cldevice monitor -i deviceconfig

Use the cldevice command to verify that the device is now monitored.

# cldevice status

How to Enable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail

When you enable this feature, a node automatically reboots, provided that the following conditions are met:

All monitored disk paths on the node fail.
At least one of the disks is accessible from a different node in the cluster.

Rebooting the node restarts all resource groups and device groups that are mastered on that node on another node.

If all monitored disk paths on a node remain inaccessible after the node automatically reboots, the node does not automatically reboot again. However, if any disk paths become available after the node reboots but then fail, the node automatically reboots again.

On any node in the cluster, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.

For all nodes in the cluster, enable the automatic rebooting of a node when all monitored disk paths to it fail.
# clnode set -p reboot_on_path_failure=enabled +

How to Disable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail

When you disable this feature and all monitored disk paths on a node fail, the node does not automatically reboot.

On any node in the cluster, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.

For all nodes in the cluster, disable the automatic rebooting of a node when monitored all monitored disk paths to it fail.
# clnode set -p reboot_on_path_failure=disabled +

Administering Disk-Path Monitoring

How to Monitor a Disk Path

Example 5–44 Monitoring a Disk Path on a Single Node

Example 5–45 Monitoring a Disk Path on All Nodes

Example 5–46 Rereading the Disk Configuration From the CCR

How to Unmonitor a Disk Path

Example 5–47 Unmonitoring a Disk Path

How to Print Failed Disk Paths

Example 5–48 Printing Faulted Disk Paths

How to Resolve a Disk-Path Status Error

How to Monitor Disk Paths From a File

Example 5–49 Monitor Disk Paths From a File

See Also

How to Enable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail

How to Disable the Automatic Rebooting of a Node When All Monitored Disk Paths Fail