Managed Entity Monitoring by HAStoragePlus

Language:

All entities that are managed by the HAStoragePlus resource type are monitored. The SUNWHAStoragePlus resource type provides a fault monitor to monitor the health of the entities managed by the HASP resource, including global devices, file systems, and ZFS storage pools. The fault monitor runs fault probes on a regular basis. If one of the entities becomes unavailable, the resource is restarted or a failover to another node is performed. If more than one entity is monitored, the fault monitor probes them all at the same time. Ensure that all configuration changes to the managed entities are completed before you enable monitoring.

Note - Version 9 of the HAStoragePlus resource fault monitor probes the devices and file systems it manages by reading and writing to the file systems. If a read operation is blocked by any software on the I/O stack and the HAStoragePlus resource is required to be online, the user must disable the fault monitor. For example, you must unmonitor the HAStoragePlus resource managing the Availability Suite Remote Replication volumes because Availability Suite from Oracle blocks reading from any bitmap volume or any data volume in the NEED SYNC state. The HAStoragePlus resource managing the Availability Suite volumes must be online at all times.

For more information on the properties that enable monitoring for managed entities, see the SUNW.HAStoragePlus(5) man page.

For instructions on enabling and disabling monitoring for managed entities, see How to Enable a Resource Fault Monitor.

Depending on the type of managed entity, the fault monitor probes the target by reading or writing to it. If more than one entity is monitored, the fault monitor probes them all at the same time.

Table 2-2 What the Fault Monitor Verifies

Monitored Entity	What the Fault Monitor Verifies
Global device	The device group is online or degraded. The device is readable.
Raw device group	The device group is online or degraded. For each device of the device group, its path (`/dev/global/rdsk/device`) is available. Partitions of every device are readable.
Solaris Volume Manager device group	The device group is online or degraded. The path of the metaset (`/dev/md/metaset`) is valid. The Solaris Volume Manager reported status from the primary of the device group: The unmirrored metadevice is not in any of the following error states: Needs Maintenance, Last Erred, or Unavailable. At least one submirror of a mirror is not in an error state. An error with some, but not all submirrors, is treated as partial error. The unmirrored metadevice is readable from the primary. Some submirrors of a mirror are readable. An error with some, but not all, submirrors is treated as partial error.
File systems (including UFS, QFS, and PxFS)	The file system is mounted. Every device under the file system is readable. The file system is readable, if the IOOption property is set to ReadOnly. The file system is writable, if the IOOption property is set to ReadWrite. If the file system is mounted read-only but the IOOption property is set to ReadWrite, the fault monitor issues a warning and then tries to read it (rather than write to it). To avoid having the `HAStoragePlus` resource go offline when a file system hits its quota, set the IOOption to ReadOnly. The ReadOnly option ensures that the fault monitor will not attempt to write to the file system.
ZFS storage pool	The pool status is OK or Degraded. Each non-legacy file system is mounted. Each non-legacy file system is readable, if the IOOption property is set to ReadOnly. Each non-legacy file system is writable, if the IOOption property is set to ReadWrite. If a non-legacy file system is mounted read-only but the IOOption property is set to ReadWrite, the fault monitor issues a warning and then tries to read it (rather than write to it). To avoid having the `HAStoragePlus` resource go offline when a file system hits its quota, set the IOOption to ReadOnly. The ReadOnly option ensures that the fault monitor will not attempt to write to the file system. Note - When all connections to a top-level ZFS storage device are lost, queries about the ZFS storage pool or associated file system will hang. To prevent the fault monitor from hanging, you must set the fail_mode property of the ZFS storage pool to `panic`.

For instructions on enabling a resource fault monitor, see How to Enable a Resource Fault Monitor.

Troubleshooting Monitoring for Managed Entities

If monitoring is not enabled on the managed entities, perform the following troubleshooting steps:

Ensure that the hastorageplus_probe process is running.
Look for error messages on the console.
Enable debug messages to the syslog file.
```
# mkdir -p /var/cluster/rgm/rt/SUNW.HAStoragePlus:9
```
```
# echo 9 > /var/cluster/rgm/rt/SUNW.HAStoragePlus:9/loglevel
```
You should also check the /etc/syslog.conf file to ensure that messages with the daemon.debug facility level are logged to the /var/adm/messages file. Add the daemon.debug entry to the /var/adm/messages action if it is not already present.