When DiskSuite encounters a problem, such as being unable to write to a metadevice due to physical errors at the slice level, it changes the status of the metadevice, for example, to "Maintenance." However, unless you are constantly looking at DiskSuite Tool or running metastat(1M), you may never see these status changes in a timely fashion.
There are two ways that you can automatically check for DiskSuite errors:
Using SNMP traps
Using a script to constantly check for errors
The first method is described in "Integrating SNMP Alerts With DiskSuite".
The following section describes the kind of script you can use to check for DiskSuite errors.
One way to continually and automatically check for a bad slice in a metadevice is to write a script that is invoked by cron. Here is an example:
# #ident "@(#)metacheck.sh 1.3 96/06/21 SMI" # # Copyright (c) 1992, 1993, 1994, 1995, 1996 by Sun Microsystems, Inc. # # # DiskSuite Commands # MDBIN=/usr/sbin METADB=${MDBIN}/metadb METAHS=${MDBIN}/metahs METASTAT=${MDBIN}/metastat # # System Commands # AWK=/usr/bin/awk DATE=/usr/bin/date MAILX=/usr/bin/mailx RM=/usr/bin/rm # # Initialization # eval=0 date=`${DATE} '+%a %b %e %Y'` SDSTMP=/tmp/sdscheck.${$} ${RM} -f ${SDSTMP} MAILTO=${*:-"root"} # default to root, or use arg list # # Check replicas for problems, capital letters in the flags indicate an error. # dbtrouble=`${METADB} | tail +2 | \ ${AWK} '{ fl = substr($0,1,20); if (fl ~ /[A-Z]/) print $0 }'` if [ "${dbtrouble}" ]; then echo "" >>${SDSTMP} echo "SDS replica problem report for ${date}" >>${SDSTMP} echo "" >>${SDSTMP} echo "Database replicas are not active:" >>${SDSTMP} echo "" >>${SDSTMP} ${METADB} -i >>${SDSTMP} eval=1 fi # # Check the metadevice state, if the state is not Okay, something is up. # mdtrouble=`${METASTAT} | \ ${AWK} '/State:/ { if ( $2 != "Okay" ) print $0 }'` if [ "${mdtrouble}" ]; then echo "" >>${SDSTMP} echo "SDS metadevice problem report for ${date}" >>${SDSTMP} echo "" >>${SDSTMP} echo "Metadevices are not Okay:" >>${SDSTMP} echo "" >>${SDSTMP} ${METASTAT} >>${SDSTMP} eval=1 fi # # Check the hotspares to see if any have been used. # hstrouble=`${METAHS} -i | \ ${AWK} ' /blocks/ { if ( $2 != "Available" ) print $0 }'` if [ "${hstrouble}" ]; then echo "" >>${SDSTMP} echo "SDS Hot spares in use ${date}" >>${SDSTMP} echo "" >>${SDSTMP} echo "Hot spares in usage:" >>${SDSTMP} echo "" >>${SDSTMP} ${METAHS} -i >>${SDSTMP} eval=1 fi # # If any errors occurred, then mail the report to root, or whoever was called # out in the command line. # if [ ${eval} -ne 0 ]; then ${MAILX} -s "SDS problems ${date}" ${MAILTO} <${SDSTMP} ${RM} -f ${SDSTMP} fi exit ${eval} |
For information on invoking scripts in this way, refer to the cron(1M) man page.
This script serves as a starting point for automating DiskSuite error checking. You may need to modify it for your own configuration.