Sun StorageTek Storage Archive Manager Troubleshooting Guide
|
|
Troubleshooting Sun StorageTek SAM Software
|
This chapter describes how to troubleshoot basic Sun StorageTek SAM functions. It contains the following sections:
Troubleshooting the Archiver
The archiver automatically writes Sun StorageTek SAM files to archive media. Operator intervention is not required to archive and stage the files. The archiver starts automatically when a SAM-QFS file system is mounted. You can customize the archiver's operations for your site by inserting archiving directives into the following file:
/etc/opt/SUNWsamfs/archiver.cmd
Upon initial setup, the archiver might not perform the tasks as intended. Make sure that you are using the following tools to monitor the archiving activity of the system:
- The File System Manager software - To display archiving activity, go to the Servers page and click the name of the server for which you want to display file system information. Click the System Administration node in the navigation tree, and then select Monitoring Console to display system information like active daemons, file system, library and drive, or archiving activities.
For complete information on using File System Manager to monitor jobs, see the File System Manager online help file.
- samu(1M) utility's a display - This display shows archiver activity for each file system. It also displays archiver errors and warning messages, such as the following:
Errors in archiver commands - no archiving will be done
|
The samu(1M) utility's a display includes messages for each file system. It indicates when the archiver will scan the .inodes file again and the files currently being archived.
- Archive logs - You can define these logs in the archiver.cmd file, and you should monitor them regularly to ensure that files are archived to volumes. Archive logs can become excessively large and should be reduced regularly either manually or through a cron(1) job. Archive these log files for safekeeping, because the information in them enables data recovery.
- sfind(1) command - Use this command to check periodically for unarchived files. If you have unarchived files, make sure you know why they are not being archived.
- sls(1) command - Files are not considered for release unless a valid archive copy exists. The sls -D command displays inode information for a file, including copy information.
Note - Output from the sls -D command might show the word archdone on a file. This is not an indication that the file has an archive copy. It is only an indication that the file has been scanned by the archiver and that all the work associated with the archiver itself has been completed. An archive copy exists only when you can view the copy information displayed by the sls(1) command.
|
Occasionally, you might see messages indicating that the archiver either has run out of space on cartridges or has no cartridges. These messages are as follows:
- When the archiver has no cartridges assigned to an archive set:
No volumes available for Archive Set setname
|
- When the archiver has no space on the cartridges assigned to an archive set:
No space available on Archive Set setname
|
Why Files Are Not Archiving
Reasons your Sun StorageTek SAM environment might not be archiving files include the following:
- The archiver.cmd file has a syntax error. Run the archiver -lv command to identify the error, then correct the flagged lines.
- The archiver.cmd file has a wait directive in it. Either remove the wait directive or override it by using the samu(1M) utility's :arrun command.
- No volumes are available. You can view this from archiver(1M) -lv command output. Add more volumes as needed. You might have to export existing cartridges to free up slots in the automated library.
- The volumes for an archive set are full. You can export cartridges and replace them with new cartridges (make sure that the new cartridges are labeled), or you can recycle the cartridges. For more information on recycling, see the Sun StorageTek Storage Archive Manager Archive Configuration and Administration Guide.
- The VSN section of the archiver.cmd file does not list correct media. Check your regular expressions and VSN pools to ensure that they are correctly defined.
- There is not enough space to archive any file on the available volumes. If you have larger files and it appears that the volumes are nearly full, the cartridges might be as full as the Sun StorageTek QFS environment allows. If this is the case, add cartridges or recycle.
If you have specified the -join path parameter, and there is not enough space to archive all the files in the directory to any volume, no archiving occurs. You should add cartridges, recycle, or use the parameter: -sort path or -rsort path.
- The archiver.cmd file has the no_archive directive set for directories or file systems that contain large files.
- The archive(1) -n (archive never) command has been used to specify too many directories, and the files are never archived.
- Large files are busy. Thus, they never reach their archive age and are not archived.
- Hardware or configuration problems exist with the automated library.
- Network connection problems exist between client and server. Ensure that the client and the server have established communications.
Additional Archiver Diagnostics
In addition to examining the items on the previous list, you should check the following when troubleshooting the archiver:
- The syslog file (by default, /var/adm/sam-log). This file can contain archiver messages that indicate the source of a problem.
- Volume capacity. Ensure that all required volumes are available and have sufficient space on them for archiving.
- The trace files. If the archiver appears to cause excessive, unexplainable cartridge activity or appears to be doing nothing, turn on the trace facility and examine the trace file. For information on trace files, see the defaults.conf(4) man page.
- The truss(1) -p pid command. You can use this command on the archiver process (sam-archiverd) to identify the system call that is not responding. For more information on the truss(1) command, see the truss(1) man page.
- The showqueue(1M) command. This command displays the content of the archiver queue files and displays the progress of archiving. You can use it to observe the state of archiver requests that are being scheduled or archived. Any archive request that cannot be scheduled generates a message indicating the reason.
Troubleshooting the Releaser
Reasons that the releaser might not release a file include the following:
- Files can be released only after they are archived. There might not be an archive copy. For more information about this, see Why Files Are Not Archiving.
- The archiver requested that a file not be released. This can occur under the following conditions:
- The archiver has just staged an offline file to make an additional copy.
- The -norelease directive in the archiver.cmd file was set, and all the copies flagged -norelease have not been archived. The releaser summary output displays the total number of files with the archnodrop flag set.
- The file is set for partial release, and the file size is less than or equal to the partial size rounded up to the disk allocation unit (DAU) size (block size).
- The file changed residence in the last min-residence-age minutes.
- The release -n command has been used to prevent directories and files from being released.
- The archiver.cmd file has the -release n option set for too many directories and files.
- The releaser high-water mark or low-water mark is set too high, and automatic releasing occurs too late or stops too soon. Verify this in the samu(1M) utility's m display or with File System Manager, and lower this value.
- Large files are busy. They will never reach their archive age, never be archived, and never be released.
Troubleshooting the Recycler
The most frequent problem encountered with the recycler occurs when the recycler is invoked and generates a message similar to the following:
Waiting for VSN mo:OPT000 to drain, it still has 123 active archive copies.
|
One of the following conditions can cause the recycler to generate this message:
- The archiver has failed to rearchive the archive copies on the volume.
- The archive copies referred to in the message are not files in the file system. Rather, they are metadata archive copies.
The first condition can exist for one of the following reasons:
- Files that need to be rearchived are marked no_archive.
- Files that need to be rearchived are in the no_archive archive set.
- Files cannot be archived because there are no available volume serial numbers (VSNs).
- The archiver.cmd file contains a wait directive.
To determine which condition is in effect, run the recycler with the -v option. As CODE EXAMPLE 2-1 shows, this option displays the path names of the files associated with the archive copies in the recycler log file.
CODE EXAMPLE 2-1 Recycler Messages
Archive copy 2 of /sam/fast/testA resides on VSN LSDAT1
Archive copy 1 of /sam3/tmp/dir2/filex resides on VSN LSDAT1
Archive copy 1 of Cannot find pathname for file system /sam3 inum/gen 30/1 resides on VSN LSDAT1
Archive copy 1 of /sam7/hgm/gunk/tstfilA00 resides on VSN LSDAT1
Archive copy 1 of /sam7/hgm/gunk/tstfilF82 resides on VSN LSDAT1
Archive copy 1 of /sam7/hgm/gunk/tstfilV03 resides on VSN LSDAT1
Archive copy 1 of /sam7/hgm/gink/tstfilA06 resides on VSN LSDAT1
Archive copy 1 of /sam7/hgm/gink/tstfilA33 resides on VSN LSDAT1
Waiting for VSN dt:LSDAT1 to drain, it still has 8 active archive copies.
|
In this example output, messages are displayed that contain seven path names with one message displaying "Cannot find pathname... text." This message will appear only after a system crash occurs that partially corrupts the .inodes file. Determine why the seven files cannot be rearchived, resolve the problem(s), and then rearchive the seven files. Note that only one archive copy is not associated with a file.
To solve the problem of finding the path name, run samfsck(1M) to reclaim orphan inodes. If you choose not to run samfsck(1M), or if you are unable to unmount the file system to run samfsck(1M), you can manually relabel the cartridge after verifying that the recycler -v output is clean of valid archive copies. However, because the recycler continues to encounter the invalid inode remaining in the .inodes file, the same problem might recur the next time the VSN is a recycle candidate.
Another recycler problem occurs when the recycler fails to select any VSNs for recycling. To determine why each VSN was rejected, you can run the recycler with the -d option. This displays information on how the recycler selects VSNs for recycling.
Sun StorageTek Storage Archive Manager Troubleshooting Guide
|
819-7933-10
|
|
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.