C H A P T E R  1

Troubleshooting Overview

Sun StorEdge SAM-FS problems are frequently symptoms of incorrect hardware and software configuration during installation or upgrade. This chapter provides basic information on diagnosing and troubleshooting such problems in the Sun StorEdge SAM-FS environment. It also discusses preparing a disaster recovery plan and testing your backup and recovery processes.

This chapter contains the following sections:


Tools for Troubleshooting

The following sub-sections provide an overview of some of the tools you might use when trouble-shooting issues in the Sun StorEdge SAM-FS environment:

Daemons

The following sections describe the daemons that can be present in a Sun StorEdge SAM-FS environment and show how to verify the functionality of these daemons.

Sun StorEdge SAM-FS Daemons

The process spawner, init(1M), starts the sam-fsd(1M) daemon based on information defined in inittab(4). The sam-fsd(1M) daemon provides overall control of the initialization of the Sun StorEdge SAM-FS environment. As part of this process, it starts a number of child daemons. These child daemons are as follows:

Verifying Sun StorEdge SAM-FS Daemons

It is possible to determine which daemons and processes should be running for a given configuration based on a knowledge of the Sun StorEdge SAM-FS daemons and processes and the circumstances under which they are started. You can check that the expected daemons or processes are running by using the ps(1) and ptree(1) commands.

CODE EXAMPLE 1-1 assumes that the ps(1) command is issued in a Sun StorEdge SAM-FS environment that includes a StorageTek L700 library connected by Automatic Cartridge System Library Software (ACSLS) to a Sun StorEdge SAM-FS system with two mounted file systems, samfs1 and samfs2. In this example, the sam-stkd(1M) daemon is running. This controls the network attached StorageTek media changers through the ACSAPI interface implemented by the ACSLS software. If such equipment were present, similar daemons would be started for network attached IBM (sam-ibm3494d(1M)) and Sony (sam-sonyd(1M)) automated libraries, and for standard direct attached automated libraries that conform to the SCSI-II standard for media changers (sam-genericd(1M)).


CODE EXAMPLE 1-1 Verifying Sun StorEdge SAM-FS Daemons
skeeball # ps -ef | grep sam-fsd | grep -v grep
    root   656     1  0 10:42:26 ?        0:00 /usr/lib/fs/samfs/sam-fsd
skeeball # ptree 656
656   /usr/lib/fs/samfs/sam-fsd
  681   sam-archiverd
    931   sam-arfind samfs2
    952   sam-arfind samfs1
  683   sam-stagealld
  682   sam-ftpd
  684   sam-stagerd
  685   sam-amld
    687   sam-catserverd 1 2
    689   sam-scannerd 1 2
    690   sam-robotsd 1 2
      691   sam-stkd 1 2 30
        692   /opt/SUNWsamfs/sbin/ssi_so 692 50014 23
        694   sam-stk_helper 1 30
skeeball #

Checking ps(1) Output and Related Factors

Check the ps(1) command's output for missing or duplicate daemon processes and defunct processes. There should be only one of each of these processes, with few exceptions, as follows:

The sam-fsd(1M) daemon reads the following configuration files: mcf(4), defaults.conf(4), diskvols.conf(4), and samfs.cmd(4). Verify that these configuration files are error free by issuing the sam-fsd(1M) command manually and watching for error messages. As CODE EXAMPLE 1-2 shows, if sam-fsd(1M) encounters errors when processing these files, it exits without starting up the Sun StorEdge SAM-FS environment.


CODE EXAMPLE 1-2 sam-fsd (1M) Output
skeeball # sam-fsd
6: /dev/dsk/c1t2d0s0    10   md    samfs1     on     /dev/rdsk/c1t2d0s0
 *** Error in line 6: Equipment ordinal 10 already in use
1 error in '/etc/opt/SUNWsamfs/mcf'
sam-fsd: Read mcf /etc/opt/SUNWsamfs/mcf failed.
skeeball #

Many of these files are described in the following sections:

Log and Trace Files

Using the appropriate log and trace files can greatly facilitate the diagnosis of Sun StorEdge SAM-FS problems. TABLE 1-1 shows the relevant files.


TABLE 1-1 Log and Trace File Summary

File

Default Location

Sun StorEdge SAM-FS log file

Configurable. Defined in /etc/syslog.conf.

System messages file

/var/adm/messages

Device logs

/var/opt/SUNWsamfs/devlog/eq

Daemon trace files

Configurable. Defined in /var/opt/SUNWsamfs/trace.

Archiver log file

Configurable. Defined in archiver.cmd(4).

Releaser log file

Configurable. Defined in releaser.cmd(4).

Stager log file

Configurable. Defined in stager.cmd(4).

Recycler log file

Configurable. Defined in recycler.cmd(4).


The following sections describe how to use the log and trace files when troubleshooting:

Enabling System Logging

The Sun StorEdge SAM-FS software makes log entries using the standard Sun StorEdge SAM-FS log file interface (see syslogd(1M), syslog.conf(4), syslog(3C)). All logging is done based on a level and a facility. The level describes the severity of the reported condition. The facility describes the component of the system sharing information with the syslogd(1M) daemon. The Sun StorEdge SAM-FS software uses facility local7 by default.


procedure icon  To Enable System Logging

To enable the syslogd(1M) daemon to receive information from the Sun StorEdge SAM-FS software for system logging, perform the following steps:

1. Add a line to the /etc/syslog.conf file to enable logging.

For example, add a line similar to the following:


local7.debug /var/adm/sam-log

You can copy this line from /opt/SUNWsamfs/examples/syslog.conf_changes. This entry is all one line, and it has a TAB character (not a space) between the fields.

2. Use touch(1) to create an empty /var/adm/sam-log file.

For example:


skeeball # touch /var/adm/sam-log

3. Send the syslogd(1M) process a SIGHUP signal.

For example:


skeeball # ps -ef | grep syslogd | grep -v grep
    root   216     1  0   Jun 20 ?        0:00 /usr/sbin/syslogd
skeeball # kill -HUP 216

4. (Optional) Use vi(1) or another editor to open the defaults.conf file and add the debugging level.

Perform this step only if you want to increase the logging level.

You can use the debug keyword in the defaults.conf file to set the default level for the debug flags. These flags are used by the Sun StorEdge SAM-FS daemons for logging system messages. The syntax for this line is as follows:


debug = option-list

The default debug level is logging, so debug=logging is the default specification. For option-list, specify a space-separated list of debug options. For more information on the options available, see the samset(1M) and defaults.conf(4) man pages.

Enabling Device Down Notification

The robot daemon, sam-robotsd(1M), starts and monitors the execution of the media changer control daemons in Sun StorEdge SAM-FS systems. The sam-amld(1M) daemon automatically starts the sam-robotsd(1M) daemon if there are any media changers defined in the mcf file. For more information, see the sam-robotsd(1M) man page.

The sam-robotsd(1M) daemon executes the /opt/SUNWsamfs/sbin/dev_down.sh notification script when any removable media device is marked down or off. By default, it sends email to root with the relevant information. It can be tailored to use syslogd(1M) or to interface with the system management software in use at a site. For more information, see the dev_down.sh(4) man page.

Enabling Daemon Tracing

You can enable daemon tracing by configuring settings in the defaults.conf(4) file. CODE EXAMPLE 1-3 shows the syntax to use in the defaults.conf(4) file to enable daemon tracing for all daemons.


CODE EXAMPLE 1-3 Syntax to Enable Daemon Tracing for all Daemons
trace
all = on
endtrace

The system writes trace files for each daemon to the following default location:


/var/opt/SUNWsamfs/trace/daemon-name

Alternatively, trace files can be turned on individually for the sam-archiverd(1M), sam-catserverd(1M), sam-fsd(1M), sam-ftpd(1M), sam-recycler(1M), and sam-stagerd(1M) processes. CODE EXAMPLE 1-4 enables daemon tracing for the archiver in /var/opt/SUNWsamfs/trace/sam-archiverd, sets the name of the archiver trace file to filename, and defines a list of optional trace events or elements to be included in the trace file as defined in option-list.


CODE EXAMPLE 1-4 Syntax to Enable sam-archiverd (1M) Tracing
trace
sam-archiverd = on
sam-archiverd.file = filename
sam-archiverd.options = option-list
sam-archiverd.size = 10M
endtrace

Daemon trace files are not automatically rotated by default. As a result, they can become very large, and they might eventually fill the /var file system. You can enable automatic trace file rotation in the defaults.conf(4) file by using the daemon-name.size parameter.

The sam-fsd(1M) daemon invokes the trace_rotate.sh(1M) script when a trace file reaches the specified size. The current trace file is renamed filename.1, the next newest is renamed filename.2, and so on, for up to seven generations. CODE EXAMPLE 1-4 specifies that the archiver trace file is to be rotated when its size reaches 10 megabytes.

For detailed information on the events that can be selected for inclusion in a trace file, see the defaults.conf(4) man page.

Enabling Device Logging

Sun StorEdge SAM-FS systems write messages for archiving devices (automated libraries and tape drives) in log files stored in /var/opt/SUNWsamfs/devlog. This directory of files contains, one log file for each device, and each of these files contains device-specific information. Each removable-media device has its own device log, which is named after its Equipment Ordinal (eq) as defined in the mcf file. There is also a device log for the Historian (Equipment Type hy) with a file name equal to the highest eq value defined in the mcf file incremented by one.

You can use the devlog keyword in the defaults.conf(4) file to set up device logging using the following syntax:


devlog eq [option-list]

If eq is set to all, the event flags specified in option-list are set for all devices.

For option-list, specify a space-separated list of devlog event options. If option-list is omitted, the default event options are err, retry, syserr, and date. For information on the list of possible event options, see the samset(1M) man page.

You can use the samset(1M) command to turn on device logging from the command line. Note that the device logs are not maintained by the system, so you must implement a policy at your site to ensure that the log files are routinely rolled over.

CODE EXAMPLE 1-5 shows sample device log output using the default output settings. It shows the first initialization of a 9840A tape drive. The drive is specified as Equipment Ordinal 31 in the mcf file.


CODE EXAMPLE 1-5 Device Log Output Example
skeeball # cat mcf
#
# Equipment             Eq   Eq    Family   Device   Additional
# Identifier            ORD Type    Set     State    Parameters
#-----------            --- ----   ------   ------   ----------
samfs1                  10   ms    samfs1     on
/dev/dsk/c1t2d0s0       11   md    samfs1     on     /dev/rdsk/c1t2d0s0
#
samfs2                  20   ms    samfs2     on
/dev/dsk/c1t2d0s1       21   md    samfs2     on     /dev/rdsk/c1t2d0s1
#
#
#        ---------- STK ACSLS Tape Library -----------
#
# Equipment                   Eq   Eq  Family Device Additional
# Identifier                  Ord Type  Set   State  Parameters
#-----------                  --- ---- ------ ------ ----------
/etc/opt/SUNWsamfs/stk30       30  sk  stk30    on     -
/dev/rmt/0cbn                 31  sg  stk30    on     -
/dev/rmt/1cbn                  32  sg  stk30    on     -
skeeball #
skeeball # ls /var/opt/SUNWsamfs/devlog
30  31  32  33
skeeball # more /var/opt/SUNWsamfs/devlog/31
2003/06/11 11:33:31*0000 Initialized. tp
2003/06/11 11:33:31*1002 Device is STK     , 9840
2003/06/11 11:33:31*1004 Rev 1.28
2003/06/11 11:33:31*1005 Known as STK 9840 Tape(sg)
2003/06/11 11:33:37 0000 Attached to process 691
2003/06/11 14:31:29 1006 Slot 0
2003/06/11 14:31:29 0000 cdb - 08 00 00 00 50 00
2003/06/11 14:31:29 0000       00 00 00 00 00 00
2003/06/11 14:31:29 0000 sense - f0 00 80 00 00 00 50 12 00 00
2003/06/11 14:31:29 0000         00 00 00 01 00 00 00 00 00 00
2003/06/11 14:31:30 0000 cdb - 08 00 00 00 50 00
2003/06/11 14:31:30 0000       00 00 00 00 00 00
2003/06/11 14:31:30 0000 sense - f0 00 80 00 00 00 50 12 00 00
2003/06/11 14:31:30 0000         00 00 00 01 00 00 00 00 00 00
2003/06/11 14:31:31 0000 cdb - 08 00 00 00 50 00
2003/06/11 14:31:31 0000       00 00 00 00 00 00
2003/06/11 14:31:31 0000 sense - f0 00 80 00 00 00 50 12 00 00
2003/06/11 14:31:31 0000         00 00 00 01 00 00 00 00 00 00
2003/06/11 14:31:31 3021 Writing labels
2003/06/11 14:31:32 1006 Slot 0
2003/06/11 14:31:32 3003 Label 700181 2003/06/11 14:31:31 blocksize = 262144
.
.

CODE EXAMPLE 1-5 shows how, about three hours after the 9840A device is initialized, a tape from slot 0 is loaded into the tape drive for archiving. The tape is checked three times for its VSN label, and each time the system reports that the media is blank. After three checks, the system concludes that the tape is blank, labels it, and then reports the VSN label (700181), the date, the time, and the media block size.

Troubleshooting Utilities

TABLE 1-2 lists the utilities that are helpful in diagnosing Sun StorEdge SAM-FS configuration problems.


TABLE 1-2 Troubleshooting Utilities

Utility

Description

sam-fsd(1M)

Initializes the environment. Debugs basic configuration problems, particularly with new installations.

samu(1M)

Provides a comprehensive display that shows the status of Sun StorEdge SAM-FS file systems and devices. Allows the operator to control file systems and removable media devices.

sls(1)

Consists of an extended version of the GNU ls(1M) command. The -D option displays extended Sun StorEdge SAM-FS attributes.

samset(1M)

Sets parameters within the Sun StorEdge SAM-FS environment.

samexplorer(1M)

Generates Sun StorEdge SAM-FS diagnostic reports. For more information, see The samexplorer(1M) Script.


For more information about these utilities, consult the relevant man pages and the Sun StorEdge SAM-FS documentation, particularly Sun StorEdge QFS Configuration and Administration Guide and the Sun StorEdge SAM-FS Storage and Archive Management Guide.

The samexplorer(1M) Script

The samexplorer(1M) script (called info.sh(1M) in versions prior to 4U1) collates information from a Sun StorEdge SAM-FS environment and writes this to file /tmp/SAMreport. The information contained in the script output, called the SAMreport, is an important aid to diagnosing complex Sun StorEdge SAM-FS problems, and it is needed by an engineer in the event of an escalation.

The SAMreport includes the following information:

If log files are not routinely collected, an important source of diagnostic information is missing from the SAMreport. It is important to ensure that sites implement a comprehensive logging policy as part of their standard system administration procedures.

It is recommended that the SAMreport be generated in the following circumstances:

Run the samexplorer script and save the SAMreport file before attempting recovery. Ensure that SAMreport is moved from /tmp before rebooting. The functionality of samexplorer has been fully incorporated into the Sun Explorer Data Collector, release 4U0. However, samexplorer provides a focused set of data tuned to the Sun StorEdge SAM-FS environment that can be quickly and simply collected and sent to escalation engineers for rapid diagnosis.


Troubleshooting Common Problems

The following sections describe common system configuration problems and their solutions:

Hardware Configuration Problems

Sun StorEdge SAM-FS problems can turn out to be hardware related. Before embarking on an extensive troubleshooting exercise, ascertain the following:


procedure icon  To Verify Hardware

It is easiest to verify the hardware configuration by performing the following procedure. However, this procedure requires you to shut down the system. If the system cannot be shut down, consult the /var/adm/messages file for the device check-in messages from the last reboot.

To verify that the Solaris OS can communicate with the devices attached to the server, perform the following steps:

1. Shut down the system.

2. Issue the probe-scsi-all command at the ok prompt.

3. Monitor the boot-up sequence messages.

While monitoring the messages, identify the check-in of the expected devices.

CODE EXAMPLE 1-6 shows the st tape devices checking in.


CODE EXAMPLE 1-6 Check In of st Tape Devices
Jun  9 13:29:39 skeeball scsi: [ID 365881 kern.info] /pci@1f,0/pci@1/scsi@3/st@4,0 (st18):
Jun  9 13:29:39 skeeball     <StorageTek 9840>
Jun  9 13:29:39 skeeball scsi: [ID 193665 kern.info] st18 at glm2: target 4 lun 0
Jun  9 13:29:39 skeeball genunix: [ID 936769 kern.info] st18 is /pci@1f,0/pci@1/scsi@3/st@4,0
Jun  9 13:29:39 skeeball scsi: [ID 365881 kern.info] /pci@1f,0/pci@1/scsi@3/st@5,0 (st19):
Jun  9 13:29:39 skeeball     <StorageTek 9840>
Jun  9 13:29:39 skeeball scsi: [ID 193665 kern.info] st19 at glm2: target 5 lun 0
Jun  9 13:29:39 skeeball genunix: [ID 936769 kern.info] st19 is /pci@1f,0/pci@1/scsi@3/st@5,0.
.

If devices do not respond, consult your Solaris documentation for information on configuring the devices for the Solaris OS.

If you have verified that the hardware has been installed and configured correctly and that no hardware faults are present, the next step in diagnosing an installation or configuration problem is to check that the expected Sun StorEdge SAM-FS daemons are running. For more information on the daemons, see Daemons.

SAN Attached Device Configuration Problems

SAN attached devices, such as Fibre Channel drives and automated libraries, should be checked to ensure that they are configured and that they are visible to the Solaris OS through the cfgadm(1M) command. CODE EXAMPLE 1-7 illustrates this for a fabric attached library controller and drives.


CODE EXAMPLE 1-7 cfgadm (1M) Command Output
# cfgadm -al
Ap_Id                   Type            Receptacle      Occupant        Condition
n
c0                      scsi-bus        connected       configured      unknown
c0::dsk/c0t0d0          disk            connected       configured      unknown
c0::dsk/c0t6d0          CD-ROM          connected       configured      unknown
c1                      scsi-bus        connected       configured      unknown
c2                      scsi-bus        connected       unconfigured    unknown
c4 	fc-fabric       connected       configured      unknown
c4::210000e08b0645c1 	unknown 	connected       unconfigured    unknown
.
.
c4::500104f00041182b    med-changer     connected       configured      unknown
c4::500104f00043abfc    tape            connected       configured      unknown
c4::500104f00045eeaf    tape            connected       configured      unknown
c4::5005076300416303    tape            connected       configured      unknown
.

If devices are in an unconfigured state, use the cfgadm(1M) command with its -c configure option to configure the devices into the Solaris environment. It is important to understand the SAN configuration rules for Fibre Channel tape devices and libraries. For more information, see the latest Sun StorEdge open SAN architecture or the SAN Foundation software documentation.


Troubleshooting Configuration Files

After the software packages have been installed, you need to tailor the Sun StorEdge SAM-FS configuration files to the site installation in order to bring the system into an operational state. Syntactical and typographical errors in these configuration files manifest themselves in unexpected behavior.

This section describes specific troubleshooting procedures for identifying issues with the Sun StorEdge SAM-FS and Sun StorEdge QFS configuration files.TABLE 1-3TABLE 1-3


TABLE 1-3 Configuration Files and Their Locations

Configuration File Purpose

Default Location

Master configuration file

/etc/opt/SUNWsamfs/mcf

st device file

/kernel/drv/st.conf

samst(7) device file

/kernel/drv/samst.conf

Device mapping

/etc/opt/SUNWsamfs/inquiry.conf

Default settings file

/etc/opt/SUNWsamfs/defaults.conf


The /etc/opt/SUNWsamfs/mcf File

The mcf(4) file defines the Sun StorEdge SAM-FS devices and device family sets.

The mcf file is read when sam-fsd(1M) is started. It can be changed at any time, even while sam-fsd is running, but sam-fsd(1M) recognizes mcf file changes only when the daemon is restarted. CODE EXAMPLE 1-8 shows an mcf file for a Sun StorEdge SAM-FS environment.


CODE EXAMPLE 1-8 Example Sun StorEdge SAM-FS mcf File
#
# Sun StorEdge SAM-FS file system configuration example
#
# Equipment       Eq Eq Family Dev Additional
# Identifier      Or Tp Set    St  Parameters
# --------------- -- -- ------ --- ----------
samfs1            60 ms samfs1
/dev/dsk/c1t1d0s6 61 md samfs1 on
/dev/dsk/c2t1d0s6 62 md samfs1 on
/dev/dsk/c3t1d0s6 63 md samfs1 on
/dev/dsk/c4t1d0s6 64 md samfs1 on
/dev/dsk/c5t1d0s6 65 md samfs1 on
#
samfs2             2 ms samfs2
/dev/dsk/c1t1d0s0 15 md samfs2 on
/dev/dsk/c1t0d0s1 16 md samfs2 on
#
/dev/samst/c0t2d0 20 od -      on
#
/dev/samst/c1t2u0 30 rb hp30   on   /var/opt/SUNWsamfs/catalog/hp30_cat
/dev/samst/c1t5u0 31 od hp30   on
/dev/samst/c1t6u0 32 od hp30   on
#
/dev/rmt/0cbn     40 od -      on
#
/dev/samst/c1t3u1 50 rb ml50   on   /var/opt/SUNWsamfs/catalog/ml50_cat
/dev/rmt/2cbn     51 tp ml50   on

The Sun StorEdge QFS Configuration and Administration Guide describes the format of the mcf file in detail.

The most common problems with the mcf file are syntactical and typographical errors. The sam-fsd(1M) command is a useful tool in debugging the mcf file. If sam-fsd(1M) encounters an error as it processes the mcf file, it writes error messages to the Sun StorEdge SAM-FS log file (if configured). It also reports errors detected in the following other files, if present:

For a newly created or modified mcf file, run the sam-fsd(1M) command and check for error messages. If necessary, correct the mcf file and rerun the sam-fsd(1M) command to ensure that the errors have been corrected. Repeat this process until all errors have been eliminated. When the mcf file is error free, reinitialize the sam-fsd(1M) daemon by sending it the SIGHUP command. CODE EXAMPLE 1-9 shows this process.


CODE EXAMPLE 1-9 Checking the mcf File
skeeball # sam-fsd
6: /dev/dsk/c1t2d0s0    10   md    samfs1     on     /dev/rdsk/c1t2d0s0
 *** Error in line 6: Equipment ordinal 10 already in use
1 error in '/etc/opt/SUNWsamfs/mcf'
sam-fsd: Read mcf /etc/opt/SUNWsamfs/mcf failed.
skeeball #
skeeball # cat mcf
#
# Equipment             Eq   Eq    Family   Device   Additional
# Identifier            ORD Type    Set     State    Parameters
#-----------            --- ----   ------   ------   ----------
samfs1                  10   ms    samfs1     on
/dev/dsk/c1t2d0s0      10   md    samfs1     on
#
samfs2                  20   ms    samfs2     on
/dev/dsk/c1t2d0s1       21   md    samfs2     on
#
#
#        ---------- STK ACSLS Tape Library -----------
#
# Equipment                   Eq   Eq  Family Device Additional
# Identifier                  Ord Type  Set   State  Parameters
#-----------                  --- ---- ------ ------ ----------
/etc/opt/SUNWsamfs/stk30       30  sk  stk30    on
/dev/rmt/0cbn                  31  sg  stk30    on    
/dev/rmt/1cbn                  32  sg  stk30    on    
skeeball #
<correct error>
skeeball #
skeeball # sam-fsd
Trace file controls:
sam-archiverd /var/opt/SUNWsamfs/trace/sam-archiverd
              cust err fatal misc proc date
              size    0    age 0
sam-catserverd /var/opt/SUNWsamfs/trace/sam-catserverd
              cust err fatal misc proc date
              size    0    age 0
sam-fsd       /var/opt/SUNWsamfs/trace/sam-fsd
              cust err fatal misc proc date
              size    0    age 0
sam-ftpd      /var/opt/SUNWsamfs/trace/sam-ftpd
              cust err fatal misc proc date
              size    0    age 0
sam-recycler  /var/opt/SUNWsamfs/trace/sam-recycler
              cust err fatal misc proc date
              size    0    age 0
sam-sharefsd  /var/opt/SUNWsamfs/trace/sam-sharefsd
              cust err fatal misc proc date
              size    0    age 0
sam-stagerd   /var/opt/SUNWsamfs/trace/sam-stagerd
              cust err fatal misc proc date
              size    0    age 0
Would stop sam-archiverd()
Would stop sam-ftpd()
Would stop sam-stagealld()
Would stop sam-stagerd()
Would stop sam-amld()
skeeball #
skeeball # samd config
skeeball #

Enable the changes to the mcf file for a running system by running the samd(1M) command with its config option (as shown at the end of CODE EXAMPLE 1-9) or by sending the SIGHUP signal to sam-fsd(1M). The procedure for reinitializing sam-fsd(1M) to make it recognize mcf file modifications varies, depending on the nature of the changes implemented in the mcf file. For the procedures to be followed in specific circumstances, see the Sun StorEdge QFS Configuration and Administration Guide.

Verifying mcf Drive Order Matching

For direct attached libraries with more than a single drive, the order in which drive entries appear in the mcf file must match the order in which they are identified by the library controller. The drive that the library controller identifies as the first drive must be the first drive entry for that library in the mcf, and so on. To check the drive order for a direct attached library, follow the instructions in the "Checking the Drive Order" section of the Sun StorEdge SAM-FS Installation and Upgrade Guide.

Network attached libraries use different procedures from direct attached libraries, because the drive order for a network attached library is defined by the library control software.

For example, for a network attached StorageTek library, the drive mapping in the ACSLS parameters file must match the drives as presented by the ACSLS interface. In this case, the procedure is similar to that for a library without a front panel, except that an additional check is necessary to ensure that the ACSLS parameters file mapping is correct.

The /kernel/drv/st.conf File

Some tape devices that are compatible with Sun StorEdge SAM-FS software are not supported by default in the Solaris operating system (OS) kernel. The file /kernel/drv/st.conf is the Solaris st(7D) tape driver configuration file for all supported tape drives. This file can be modified to enable operation of normally unsupported drives with a Sun StorEdge SAM-FS system. Attempting to use any such device in the Sun StorEdge SAM-FS environment without updating the st.conf file, or with an incorrectly modified file, causes the system to write messages such as the following to device log file:


Aug  3  19:43:36 samfs2 scanner[242]: Tape device 92 is default type. Update /kernel/drv/st.conf

If your configuration is to include devices not supported by the Solaris OS, consult the following file for instructions on how to modify the st.conf file:

/opt/SUNWsamfs/examples/st.conf_changes

For example, the IBM LTO drive is not supported by default in Solaris kernel. CODE EXAMPLE 1-10 shows the lines you need to add to the st.conf file in order to include IBM LTO drives in a Sun StorEdge SAM-FS environment.


CODE EXAMPLE 1-10 Lines to be Added to st.conf
"IBM    ULTRIUM-TD1",           "IBM Ultrium",  "CLASS_3580",
CLASS_3580      =       1,0x24,0,0x418679,2,0x00,0x01,0;

The st.conf file is read only when the st driver is loaded, so if the /kernel/drv/st.conf file is modified, perform one of the following actions in order to direct the system to recognize the changes:

The /kernel/drv/samst.conf File

The samst(7) driver for SCSI media changers and optical drives is used for direct attached SCSI or Fibre Channel tape libraries and for magneto-optical drives and libraries.

As part of the installation process, the Sun StorEdge SAM-FS software creates entries in the /dev/samst directory for all devices that were attached and recognized by the system before the pkgadd(1M) command was entered to begin the installation.

If you add devices after running the pkgadd(1M) command, you must use the devfsadm(1M) command, as follows, to create the appropriate device entries in /dev/samst:


# /usr/sbin/devfsadm -i samst

After the command is issued, verify that the device entries have been created in /dev/samst. If they have not, perform a reconfiguration reboot and attempt to create the entries again.

If the /dev/samst device is not present for the automated library controller, the samst.conf file might need to be updated. In general, Fibre Channel libraries, libraries with targets greater than 7, and libraries with LUNs greater than 0 require the samst.conf file to be updated. To add support for such libraries, add a line similar to the following to the /kernel/drv/samst.conf file:


name="samst" parent="fp" lun=0 fc-port-wwn="500104f00041182b";

In the previous example line, 500104f00041182b is the World Wide Name (WWN) port number of the fibre attached automated library. If you need to, you can obtain the WWN port number from the cfgadm(1M) command's output. CODE EXAMPLE 1-11 shows this command.


CODE EXAMPLE 1-11 Using cfgadm (1M) to obtain the WWN
# cfgadm -al
Ap_Id           Type            Receptacle      Occupant        Condition
c0                      scsi-bus        connected       configured      unknown
c0::dsk/c0t0d0          disk            connected       configured      unknown
c0::dsk/c0t6d0          CD-ROM          connected       configured      unknown
c1                      scsi-bus        connected       configured      unknown
c2                      scsi-bus        connected       unconfigured    unknown
c4                      fc-fabric       connected       configured      unknown
c4::210000e08b0645c1    unknown 	connected       unconfigured    unknown
.
.
c4::500104f00041182b    med-changer     connected       configured      unknown
c4::500104f00043abfc    tape            connected       configured      unknown
c4::500104f00045eeaf    tape            connected       configured      unknown
c4::5005076300416303    tape            connected       configured      unknown
.

For network attached tape libraries such as a StorageTek library controlled by ACSLS, the samst driver is not used, and no /dev/samst device entries are created.

The /etc/opt/SUNWsamfs/inquiry.conf File

The /etc/opt/SUNWsamfs/inquiry.conf file defines vendor and product identification strings for recognized SCSI or fibre devices and matches these with Sun StorEdge SAM-FS product strings. If you have devices that are not defined in inquiry.conf, you need to update the file with the appropriate device entries. This is not a common practice because the great majority of devices are defined in the file. CODE EXAMPLE 1-12 shows an fragment of the inquiry.conf file.


CODE EXAMPLE 1-12 Part of the inquiry.conf File
"ATL",          "ACL2640",      "acl2640"       # ACL 2640 tape library
"HP",           "C1160A",       "hpoplib"       # HP optical library
"IBM"           "03590",        "ibm3590"       # IBM3590 Tape
"MTNGATE"       "V-48"          "metd28"        # metrum v-48 tape library
"OVERLAND",     "LXB",          "ex210" 	 # Overland LXB2210 robot
"Quantum"       "DLT2000",      "dlt2000"       # digital linear tape
"STK",          "9490", 	 "stk9490"       # STK 9490 tape drive
"STK",          "97",           "stk97xx"       # STK 9700 series SCSI
"STK",          "SD-3"          "stkd3" 	 # STK D3 tape drive

If changes to this file are required, you must make them and then reinitialize your Sun StorEdge SAM-FS software by issuing the following commands:


# samd stop
# samd config

If the system detects errors in the inquiry.conf file during reinitialization, it writes messages to the Sun StorEdge SAM-FS log file. Check for error messages similar to those shown in CODE EXAMPLE 1-13 after making changes to inquiry.conf and reinitializing the Sun StorEdge SAM-FS software.


CODE EXAMPLE 1-13 Messages Regarding inquiry.conf Problems
.
May 22 16:11:49 ultra1 samfs[15517]: Unknown device, eq 30 ("/dev/samst/c0t2u0"), dtype (0x8)
May 22 16:11:49 ultra1 samfs[15517]: Vender/product OVERLAND LXB.
May 22 16:11:49 ultra1 samfs[15517]: Update /etc/opt/SUNWsamfs/inquiry.conf (see inquiry.conf(4)).
May 22 16:11:49 ultra1 samfs[15517]: Device being offed eq 30.
.

The /etc/opt/SUNWsamfs/defaults.conf File

The defaults.conf configuration file allows you to establish certain default parameter values for a Sun StorEdge SAM-FS environment. The system reads the defaults.conf file is when sam-fsd(1M) is started or reconfigured. It can be changed at any time while the sam-fsd(1M) daemon is running. The changes take effect when the sam-fsd(1M) daemon is restarted, or when it is sent the signal SIGHUP. Temporary changes to many values can be made using the samset(1M) command.

The sam-fsd(1M) command is also useful for debugging the defaults.conf(4) file. If the sam-fsd(1M) daemon encounters an error as it processes the defaults.conf(4) file, it writes error messages to the Sun StorEdge SAM-FS log file.

For a newly created or modified defaults.conf(4) file, run the sam-fsd(1M) command and check for error messages. If necessary, correct the file and rerun the sam-fsd(1M) command to ensure that the errors have been corrected. Repeat this process until all errors have been eliminated.

If you modify the defaults.conf(4) file on a running system, you need to reinitialize it by restarting the sam-fsd(1M) daemon. You can use the samd(1M) command with its config option to restart sam-fsd(1M). See the Sun StorEdge QFS Configuration and Administration Guide for the procedures to be followed in specific circumstances.


Planning for Disaster Recovery

It is essential that you back up your data and establish disaster recovery processes so that data can be retrieved if any of the following occur:

Chapter 4 provides the information you need to know about backing up metadata and other important configuration data. The remaining chapters in this manual describe how to use the data you back up to recover from various types of disasters.

Setting up processes for doing backups and system dumps is only part of preparing to recover from a disaster. The following tasks are also necessary:

Recovering From Failure of the Operating Environment Disk

When a disk containing the operating environment for a system fails, after you replace the defective disk, you need to perform bare metal recovery before you can do anything else. Two bare metal recovery approaches are available:

This process is slower than restoring a system image backup.

Image backups need to be made only when system configuration changes are made. The downside to this approach is that it is difficult to safely transport hard disks to off site storage.

Testing Backup and Recovery Methods

After you have set up data recovery processes, you should do the testing described in the following sections:

Testing Backup Scripts and cron Jobs

Always test backup scripts and cron(1) jobs on a development or test system before rolling it out to all systems.

Testing the Disaster Recovery Process

Use the information in the other chapters in this manual to do the following tests in order to verify how well your disaster recovery process works. Do these tests periodically and anytime you make changes to the software.