Managed Entity Monitoring by HAStoragePlus
All entities that are managed by the
HAStoragePlus resource type are monitored. The
SUNWHAStoragePlus resource type
provides a fault monitor to monitor the health of the
entities managed by the HASP resource, including global
devices, file systems, and ZFS storage pools. The fault
monitor runs fault probes on a regular basis. If one of the
entities becomes unavailable, the resource is restarted or a
failover to another node is performed. If more than one
entity is monitored, the fault monitor probes them all at
the same time. Ensure that all configuration changes to the
managed entities are completed before you enable
monitoring.
Note -
Version 9 of the HAStoragePlus resource fault monitor
probes the devices and file systems it manages by
reading and writing to the file systems. If a read
operation is blocked by any software on the I/O
stack and the HAStoragePlus
resource is required to be online, the user must
disable the fault monitor. For example, you must
unmonitor the HAStoragePlus
resource managing the Availability Suite Remote
Replication volumes because Availability Suite
from Oracle blocks reading from any bitmap volume
or any data volume in the NEED
SYNC state. The
HAStoragePlus resource managing
the Availability Suite volumes must be online at
all times.
For more information on the properties that enable monitoring
for managed entities, see the SUNW.HAStoragePlus(5) man page.
For instructions on enabling and disabling monitoring for
managed entities, see How to Enable a Resource Fault Monitor.
Depending on the type of managed entity, the fault monitor
probes the target by reading or writing to it. If more than
one entity is monitored, the fault monitor probes them all
at the same time.
Table 2-2 What the Fault Monitor Verifies
|
|
Global device
|
|
Raw device group
|
-
The device group is online or
degraded.
-
For each device of the device group, its
path
(/dev/global/rdsk/device)
is available.
-
Partitions of every device are
readable.
|
Solaris Volume Manager device group
|
-
The device group is online or
degraded.
-
The path of the metaset
(/dev/md/metaset)
is valid.
-
The Solaris Volume Manager reported status
from the primary of the device group:
-
The unmirrored metadevice is not in any of
the following error states: Needs Maintenance,
Last Erred, or Unavailable.
-
At least one submirror of a mirror is not in
an error state. An error with some, but not all
submirrors, is treated as partial error.
-
The unmirrored metadevice is readable from
the primary.
-
Some submirrors of a mirror are readable. An
error with some, but not all, submirrors is
treated as partial error.
|
File systems (including UFS, QFS, and PxFS)
|
-
The file system is mounted.
-
Every device under the file system is
readable.
-
The file system is readable, if the
IOOption property is set to
ReadOnly.
-
The file system is writable, if the
IOOption property is set to
ReadWrite.
-
If the file system is mounted read-only but
the IOOption property is set
to ReadWrite, the fault
monitor issues a warning and then tries to read it
(rather than write to it).
-
To avoid having the
HAStoragePlus resource go
offline when a file system hits its quota, set the
IOOption to
ReadOnly. The
ReadOnly option ensures that
the fault monitor will not attempt to write to the
file system.
|
ZFS storage pool
|
-
The pool status is OK or Degraded.
-
Each non-legacy file system is
mounted.
-
Each non-legacy file system is readable, if
the IOOption property is set
to ReadOnly.
-
Each non-legacy file system is writable, if
the IOOption property is set
to ReadWrite.
-
If a non-legacy file system is mounted
read-only but the IOOption
property is set to ReadWrite,
the fault monitor issues a warning and then tries
to read it (rather than write to it).
-
To avoid having the
HAStoragePlus resource go
offline when a file system hits its quota, set the
IOOption to
ReadOnly. The
ReadOnly option ensures that
the fault monitor will not attempt to write to the
file system.
Note -
When all connections to a top-level ZFS
storage device are lost, queries about the ZFS
storage pool or associated file system will hang.
To prevent the fault monitor from hanging, you
must set the fail_mode
property of the ZFS storage pool to
panic.
|
|
For instructions on enabling a resource fault monitor, see
How to Enable a Resource Fault Monitor.
Troubleshooting Monitoring for Managed
Entities
If monitoring is not enabled on the managed entities,
perform the following troubleshooting
steps:
-
Ensure that the
hastorageplus_probe process is
running.
-
Look for error messages on the
console.
-
Enable debug messages to the
syslog file.
# mkdir -p /var/cluster/rgm/rt/SUNW.HAStoragePlus:9
# echo 9 > /var/cluster/rgm/rt/SUNW.HAStoragePlus:9/loglevel
You should also check the
/etc/syslog.conf file to
ensure that messages with the
daemon.debug facility level are
logged to the
/var/adm/messages file. Add
the daemon.debug entry to the
/var/adm/messages action if
it is not already present.