Administering Disk-Path Monitoring

Disk path monitoring (DPM) administration commands enable you to receive notification of secondary disk-path failure. Use the procedures in this section to perform administrative tasks that are associated with monitoring disk paths. Refer to Chapter 3, Key Concepts for System Administrators and Application Developers, in Oracle Solaris Cluster Concepts Guide for conceptual information about the disk-path monitoring daemon. Refer to the cldevice(1CL) man page for a description of the command options and related commands. For more information about tuning the scdpmd daemon, see the scdpmd.conf(4) man page. Also see the syslogd(1M) man page for logged errors that the daemon reports.

Note - Disk paths are automatically added to the monitoring list monitored when I/O devices are added to a node by using the cldevice command. Disk paths are also automatically unmonitored when devices are removed from a node by using Oracle Solaris Cluster commands.

Table 5-6 Task Map: Administering Disk-Path Monitoring

Task	Instructions
Monitor a disk path.	How to Monitor a Disk Path
Unmonitor a disk path.	How to Unmonitor a Disk Path
Print the status of faulted disk paths for a node.	How to Print Failed Disk Paths
Monitor disk paths from a file.	How to Monitor Disk Paths From a File
Enable or disable the automatic rebooting of a node when all monitored shared-disk paths fail.	How to Enable the Automatic Rebooting of a Node When All Monitored Shared-Disk Paths Fail How to Disable the Automatic Rebooting of a Node When All Monitored Shared-Disk Paths Fail
Resolve an incorrect disk-path status. An incorrect disk-path status can be reported when the monitored DID device is unavailable at boot time, and the DID instance is not uploaded to the DID driver.	How to Resolve a Disk-Path Status Error

The procedures in the following section that issue the cldevice command include the disk-path argument. The disk-path argument consists of a node name and a disk name. The node name is not required and defaults to all if you do not specify it.

How to Monitor a Disk Path

Perform this task to monitor disk paths in your cluster.

Caution - DPM is not supported on nodes that run versions that were released prior to Sun Cluster 3.1 10/03 software. Do not use DPM commands while a rolling upgrade is in progress. After all nodes are upgraded, the nodes must be online to use DPM commands.

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.
Monitor a disk path.
```
# cldevice monitor -n node disk
```
Verify that the disk path is monitored.
```
# cldevice status device
```

Example 5-45 Monitoring a Disk Path on a Single Node

The following example monitors the schost-1:/dev/did/rdsk/d1 disk path from a single node. Only the DPM daemon on the node schost-1 monitors the path to the disk /dev/did/dsk/d1 .

# cldevice monitor -n schost-1 /dev/did/dsk/d1
# cldevice status d1

Device Instance   Node           Status
--------------- ---- ------
/dev/did/rdsk/d1   phys-schost-1 Ok

Example 5-46 Monitoring a Disk Path on All Nodes

The following example monitors the schost-1:/dev/did/dsk/d1 disk path from all nodes. DPM starts on all nodes for which /dev/did/dsk/d1 is a valid path.

# cldevice monitor /dev/did/dsk/d1
# cldevice status /dev/did/dsk/d1

Device Instance   Node           Status
--------------- ---- ------
/dev/did/rdsk/d1   phys-schost-1 Ok

Example 5-47 Rereading the Disk Configuration From the CCR

The following example forces the daemon to reread the disk configuration from the CCR and prints the monitored disk paths with status.

# cldevice monitor +
# cldevice status
Device Instance              Node               Status
---------------              ----               ------
/dev/did/rdsk/d1             schost-1           Ok
/dev/did/rdsk/d2             schost-1           Ok
/dev/did/rdsk/d3             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d4             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d5             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d6             schost-1           Ok
                              schost-2          Ok
/dev/did/rdsk/d7             schost-2           Ok
/dev/did/rdsk/d8             schost-2           Ok

How to Unmonitor a Disk Path

Use this procedure to unmonitor a disk path.

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.
Determine the state of the disk path to unmonitor.
```
# cldevice status device
```
On each node, unmonitor the appropriate disk paths.
```
# cldevice unmonitor -n node disk
```

Example 5-48 Unmonitoring a Disk Path

The following example unmonitors the schost-2:/dev/did/rdsk/d1 disk path and prints disk paths with status for the entire cluster.

# cldevice unmonitor -n schost2 /dev/did/rdsk/d1
# cldevice status -n schost2 /dev/did/rdsk/d1

Device Instance              Node               Status
---------------              ----               ------
/dev/did/rdsk/d1             schost-2           Unmonitored

How to Print Failed Disk Paths

Use the following procedure to print the faulted disk paths for a cluster.

Become superuser on any node in the cluster.
Print the faulted disk paths throughout the cluster.
```
# cldevice status -s fail
```

Example 5-49 Printing Faulted Disk Paths

The following example prints faulted disk paths for the entire cluster.

# cldevice status -s fail
     
Device Instance               Node              Status
---------------               ----              ------
dev/did/dsk/d4                phys-schost-1     fail

How to Resolve a Disk-Path Status Error

If the following events occur, DPM might not update the status of a failed path when it comes back online:

A monitored-path failure causes a node reboot.
The device under the monitored DID path does not come back online until after the rebooted node is back online.

The incorrect disk-path status is reported because the monitored DID device is unavailable at boot time, and therefore the DID instance is not uploaded to the DID driver. When this situation occurs, manually update the DID information.

From one node, update the global-devices namespace.
```
# cldevice populate
```
On each node, verify that command processing has completed before you proceed to the next step.
The command executes remotely on all nodes, even though the command is run from just one node. To determine whether the command has completed processing, run the following command on each node of the cluster.
```
# ps -ef | grep cldevice populate
```

Verify that, within the DPM polling time frame, the status of the faulted disk path is now Ok.

# cldevice status disk-device

Device Instance               Node                  Status
---------------               ----                  ------
dev/did/dsk/dN                phys-schost-1         Ok

How to Monitor Disk Paths From a File

Use the following procedure to monitor or unmonitor disk paths from a file.

To change your cluster configuration by using a file, you must first export the current configuration. This export operation creates an XML file that you can then modify to set the configuration items you are changing. The instructions in this procedure describe this entire process.

The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.

Become superuser or assume a role that provides solaris.cluster.modify RBAC authorization on any node in the cluster.
Export your device configuration to an XML file.
```
# cldevice export -o configurationfile
```
-o configurationfile

Specify the file name for your XML file.
Modify the configuration file so that device paths are monitored.
Find the device paths that you want to monitor, and set the monitored attribute to true.
Monitor the device paths.
```
# cldevice monitor -i configurationfile
```
-i configurationfile

Specify the file name of the modified XML file.
Verify that device path is now monitored.
```
# cldevice status
```

Example 5-50 Monitor Disk Paths From a File

In the following example, the device path between the node phys-schost–2 and device d3 is monitored by using an XML file.

The first step is to export the current cluster configuration.

# cldevice export -o deviceconfig

The deviceconfig XML file shows that the path between phys-schost–2 and d3 is not currently monitored.

<?xml version="1.0"?>
<!DOCTYPE cluster SYSTEM "/usr/cluster/lib/xml/cluster.dtd">
<cluster name="brave_clus">
.
.
.
   <deviceList readonly="true">
    <device name="d3" ctd="c1t8d0">
      <devicePath nodeRef="phys-schost-1" monitored="true"/>
      <devicePath nodeRef="phys-schost-2" monitored="false"/>
    </device>
  </deviceList>
</cluster>

To monitor that path, set the monitored attribute to true, as follows.

<?xml version="1.0"?>
<!DOCTYPE cluster SYSTEM "/usr/cluster/lib/xml/cluster.dtd">
<cluster name="brave_clus">
.
.
.
   <deviceList readonly="true">
    <device name="d3" ctd="c1t8d0">
      <devicePath nodeRef="phys-schost-1" monitored="true"/>
      <devicePath nodeRef="phys-schost-2" monitored="true"/>
    </device>
  </deviceList>
</cluster>

Use the cldevice command to read the file and turn on monitoring.

# cldevice monitor -i deviceconfig

Use the cldevice command to verify that the device is now monitored.

# cldevice status

How to Enable the Automatic Rebooting of a Node When All Monitored Shared-Disk Paths Fail

When you enable this feature, a node automatically reboots, provided that the following conditions are met:

All monitored shared-disk paths on the node fail.
At least one of the disks is accessible from a different node in the cluster.

Rebooting the node restarts all resource groups and device groups that are mastered on that node on another node.

If all monitored shared-disk paths on a node remain inaccessible after the node automatically reboots, the node does not automatically reboot again. However, if any disk paths become available after the node reboots but then fail, the node automatically reboots again.

When you enable the reboot_on_path_failure property, the states of local-disk paths are not considered when determining if a node reboot is necessary. Only monitored shared disks are affected.

On any node in the cluster, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.
For all nodes in the cluster, enable the automatic rebooting of a node when all monitored shared-disk paths to it fail.
```
# clnode set -p reboot_on_path_failure=enabled +
```

How to Disable the Automatic Rebooting of a Node When All Monitored Shared-Disk Paths Fail

When you disable this feature and all monitored shared-disk paths on a node fail, the node does not automatically reboot.

On any node in the cluster, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.
For all nodes in the cluster, disable the automatic rebooting of a node when monitored all monitored shared-disk paths to it fail.
```
# clnode set -p reboot_on_path_failure=disabled +
```

Skip Navigation Links
Exit Print View
	Oracle Solaris Cluster System Administration Guide