Solstice DiskSuite 4.2.1 User's Guide

Checking For Errors

When DiskSuite encounters a problem, such as being unable to write to a metadevice due to physical errors at the slice level, it changes the status of the metadevice, for example, to "Maintenance." However, unless you are constantly looking at DiskSuite Tool or running metastat(1M), you may never see these status changes in a timely fashion.

There are two ways that you can automatically check for DiskSuite errors:

The first method is described in "Integrating SNMP Alerts With DiskSuite".

The following section describes the kind of script you can use to check for DiskSuite errors.

How to Automate Checking for Slice Errors in Metadevices (Command Line)

One way to continually and automatically check for a bad slice in a metadevice is to write a script that is invoked by cron. Here is an example:


#
#ident "@(#)metacheck.sh   1.3     96/06/21 SMI"
#
# Copyright (c) 1992, 1993, 1994, 1995, 1996 by Sun Microsystems, Inc.
#
 
#
# DiskSuite Commands
#
MDBIN=/usr/sbin
METADB=${MDBIN}/metadb
METAHS=${MDBIN}/metahs
METASTAT=${MDBIN}/metastat
 
#
# System Commands
#
AWK=/usr/bin/awk
DATE=/usr/bin/date
MAILX=/usr/bin/mailx
RM=/usr/bin/rm
 
#
# Initialization
#
eval=0
date=`${DATE} '+%a %b %e %Y'`
SDSTMP=/tmp/sdscheck.${$}
${RM} -f ${SDSTMP}
 
MAILTO=${*:-"root"}			# default to root, or use arg list
 
#
# Check replicas for problems, capital letters in the flags indicate an error.
#
dbtrouble=`${METADB} | tail +2 | \
    ${AWK} '{ fl = substr($0,1,20); if (fl ~ /[A-Z]/) print $0 }'`
if [ "${dbtrouble}" ]; then
        echo ""   >>${SDSTMP}
        echo "SDS replica problem report for ${date}"	>>${SDSTMP}
        echo ""   >>${SDSTMP}
        echo "Database replicas are not active:"     >>${SDSTMP}
        echo ""   >>${SDSTMP}
        ${METADB} -i >>${SDSTMP}
        eval=1
fi
 
#
# Check the metadevice state, if the state is not Okay, something is up.
#
mdtrouble=`${METASTAT} | \
    ${AWK} '/State:/ { if ( $2 != "Okay" ) print $0 }'`
if [ "${mdtrouble}" ]; then
        echo ""  >>${SDSTMP}
        echo "SDS metadevice problem report for ${date}"  >>${SDSTMP}
        echo ""  >>${SDSTMP}
        echo "Metadevices are not Okay:"  >>${SDSTMP}
        echo ""  >>${SDSTMP}
        ${METASTAT} >>${SDSTMP}
        eval=1
fi
 
#
# Check the hotspares to see if any have been used.
#
hstrouble=`${METAHS} -i | \
    ${AWK} ' /blocks/ { if ( $2 != "Available" ) print $0 }'`
if [ "${hstrouble}" ]; then
        echo ""  >>${SDSTMP}
        echo "SDS Hot spares in use  ${date}"  >>${SDSTMP}
        echo ""  >>${SDSTMP}
        echo "Hot spares in usage:"  >>${SDSTMP}
        echo ""  >>${SDSTMP}
        ${METAHS} -i >>${SDSTMP}
        eval=1
fi
#
# If any errors occurred, then mail the report to root, or whoever was called
# out in the command line.
#
if [ ${eval} -ne 0 ]; then
        ${MAILX} -s "SDS problems ${date}" ${MAILTO} <${SDSTMP}
        ${RM} -f ${SDSTMP}
fi
 
exit ${eval}

For information on invoking scripts in this way, refer to the cron(1M) man page.


Note -

This script serves as a starting point for automating DiskSuite error checking. You may need to modify it for your own configuration.