This section explains how to troubleshoot various problems you might encounter with backup and recover operations.
If you have trouble starting Backup, the daemons may not be running properly. To determine whether the required daemons are running, enter the one of the following commands at the shell prompt:
# ps -aux | grep nsr |
or:
# ps -ef | grep nsr |
You should receive a response similar to the following:
12217 ? S 0:09 /usr/sbin/nsrexecd -s jupiter 12221 ? S 2:23 /usr/sbin/nsrd 12230 ? S 0:00 /usr/sbin/nsrmmdbd 12231 ? S 0:01 /usr/sbin/nsrindexd 12232 ? S 0:00 /usr/sbin/nsrmmd -n 1 12234 ? S 0:00 /usr/sbin/nsrmmd -n 2 12235 ? S 0:00 /usr/sbin/nsrmmd -n 3 12236 ? S 0:00 /usr/sbin/nsrmmd -n 4 12410 pts/8 S 0:00 grep nsr |
If the response indicates that the daemons are not present, start the Backup daemons:
# /etc/init.d/networker start |
During a backup, you attempt to stop the process by clicking Stop in the Group Control window. This should stop the process for all clients in the selected group, but sometimes a client is missed. You then see messages that indicate the server is still busy.
To resolve the problem, on the client machine, determine which clients still have a save process running by using one of the following commands:
# ps -aux | grep save |
or:
# ps -ef | grep save |
This command returns a process identification number (pid) for each process associated with save. Enter the following command to stop the save process for each pid:
# kill -9 pid |
Backup does not notify you when a client file index is getting too large. You should monitor the system regularly to check the size of client file indexes. See "Index Management " for information on how to manage the Backup client file indexes.
When you enable Auto Media Verify for a pool, Backup verifies the data written to volumes from the pool while saving. This is done by reading a record of data written to the media and comparing it to the original record. Media is verified after Backup finishes writing to the volume, which might occur when a volume becomes full or when Backup no longer needs the volume for saving data.
To verify media, nsrmmd must reposition the volume to read previously written data. It does not always succeed in the first attempt. These warning messages appear in the message display in the Backup administration program (nwadmin):
media warning: /dev/rmt2.1 moving: fsr 15: I/O error media emergency: could not position jupiter.007 to file 44, record 16 |
No action is required. Backup continues to attempt to find the proper position. If Backup can find the correct position, media verification succeeds and a successful completion message appears.
media info: verification of volume "jupiter.007" volid 30052 succeeded. |
In this case, ignore the earlier messages because they only indicate that Backup had problems finding the desired position on the media. If the problem is serious, media verification fails and a subsequent message gives the reason for the failure.
When your server is waiting for a tape to be mounted or is in the process of changing an autochanger volume, you see the PACKET RECEIVE BUFFER and NO ECB counters increase on a NetWare client.
To resolve this problem, use the nsr_shutdown command to shut down the Backup server. Then, restart Backup manually. See "Checking the Backup Daemons" for commands to restart manually.
On Solaris, Backup executables are installed by default in /usr/sbin/nsr. If you start a group backup on a Backup server that does not have /usr/sbin/nsr in the search path for root, the backup fails on a client that has its Backup executables in /usr/sbin/nsr. This is because the savefs command is not in the search path.
The best solution is to set the Executable Path hidden attribute for a client that has this problem. To set the Executable Path, display the Clients attribute in details view and enter the path of the executables, /usr/sbin/nsr, in the Executable Path attribute.
Another solution is to modify the search path for root on the Backup server to include /usr/sbin/nsr even if it does not exist locally.
When you use the scanner program to rebuild the index of a backup volume, the scanner program marks the volume read-only.
This is a safety feature that prevents the last save set on the backup volume from being overwritten. To write to the media without marking it read-only, use the nsrmm -o command:
# nsrmm -o notreadonly volume-name |
Suppose you attempt to recover indexes to a directory other than the one where they were originally located, then receive the following error message:
WARNING: The on-line index for `client-name' was NOT fully recovered. There may have been a media error. You can retry the recover, or attempt to recover another version of the `client-name' index. |
Do not attempt to recover the indexes to a different directory. After the indexes have been recovered to their original location, you can move them to another directory.
Because the indexes are holey files, using the UNIX cp command creates a file that consumes more disk space than the original file. To move the indexes, invoke the following command as root from within the /nsr/index directory:
# uasm -s -i client-index-directory-name | (cd target-directory; uasm -r) |
If you encounter any of the following situations, a client alias problem might be the cause.
You receive the following error messages: "No client resource for..." or "Client xxx cannot back up client yyy files."
A client machine always performs full backups, regardless of the level of the scheduled backup.
It appears that automatic index management according to the browse and retention policies does not occur. This is indicated by the filesystem containing the indexes continuously increasing in size.
In /nsr/index, the directory that contains the indexes, there are two directories for the same client using two different client names.
A client alias change is needed for the following situations:
Machines that have two or more network interfaces
Sites that mix short and "fully qualified" hostnames for the same machines; for example, jupiter and jupiter.oak.com
Sites using both YP (NIS) and DNS
Use the Backup administration program or nsradmin to edit the client resource for clients with this problem. Add all network names for this host to the Aliases attribute.
Do not put aliases that are shared by other hosts on this line.
When you upgrade from earlier versions of Backup, the configuration names of label templates, directives, groups, policies, and schedules that include the following special characters are no longer allowed:
/\\*?[]()$!^;'\"`~><&|{}
This change was made because volume labels, directives, groups, policies, and schedules are often passed as command line options to various Backup programs.
During installation of Backup, these characters in your current configuration names are replaced with an underscore (_) in the resources where they were originally created.
However, in the Clients resource where these configurations are applied, Backup automatically replaces the selected configuration with the preexisting Default configuration.
You need to reselect the configurations whose names have changed and reapply them to the individual clients.
If you use the scanner program with the -s option but without an -i or -m option, and you receive the message:
please enter record size for this volume ('q' to quit) [xx] |
The number in the bracket [xx] is the entry from the last query.
The scanner command always rewinds the tape and reads the volume label to determine the block size. If the volume label is corrupted or unreadable, you see a message prompting you to enter the block size (in kilobytes).
Type in the block size; it must be an integer equal to or greater than 32. If you enter an integer that is less than 32, you receive the following message:
illegal record size (must be an integer >=32) |
If you attempt to start the nwrecover program immediately after installing Backup for the first time on your system, you receive the error message nwrecover: Program not found.
To save disk space, Backup delays the creation of the client index until the first backup is completed. The nwrecover program cannot recover data until the client index has entries for browsing. To avoid the problem, perform a Backup backup on the client.
If you terminate a backup by killing the Backup daemons, you cannot recover the files because the media database is not updated when the daemons die. Consequently, Backup does not know which volumes the requested files reside on.
The first time you back up a new client, you receive the following message:
mars:/usr no cycles round in media db; doing full save. |
In this example, the /usr filesystem on the mars client has no full saves listed in the media database. Therefore, regardless of the backup level selected for the client's schedule, Backup performs a full backup. This feature is important because it enables you to perform disaster recoveries for the client.
You may also receive this message if the server and client clocks are not synchronized. To avoid this, make sure that the Backup server and the client are in the same time zone and have their clocks synchronized.
Backup maintains a client file index for every client it backs up. If you change the name of the client, the index for that client is not associated with the client's new name and you cannot recover files backed up under the old client name. To recover previous backup data for a newly renamed client, see "How To Recover Previous Backup Data Under a New Client Name".
Delete the Client resource configured for the old client name.
Create a new Client resource for the new client name.
Shut down the Backup daemons.
# nsr_shutdown |
Delete the index directory that was automatically created for the new client.
If you simply copy the new client index over the old client index directory, the result is a nesting of the new client index inside the old client index directory.
Use the mv command to rename the old client's file index directory.
# mv /nsr/index/old-client /nsr/index/new-client |
If you receive the error message "No disk label," you may have incorrectly configured a nonoptical device as an optical device within Backup. Verify that the Media Type attribute in the Devices resource matches the expected media for your device, and make corrections if needed.
Certain Hewlett-Packard tape drives can only read 4 mm tapes of a specific length. Some, for example, read only 60 meters tapes and do not support the use of 90- or 120- meter tapes. To determine the type of tape supported by your HP drive, consult the hardware manual provided with the drive.
If you attempt to use unsupported media in an HP tape drive, you may encounter the following types of error messages:
When you use the nsrmm or nsrjb command to label the tape:
nsrmm: error, label write, No more processes (5) |
When you attempt to use the scanner -i command:
scanner: error, tape label read, No more processes (11) scanning for valid records ... read: 0 bytes read: 0 bytes read: 0 bytes |
If your server bootstraps are not printed, you may need to enter your printer's name as a hidden attribute in the Groups resource. Access the hidden attributes by selecting Details from the View menu in the graphical administration program (nwadmin) or by selecting the Hidden choice from the Options menu in the cursor-based administration program (nsradmin).
Enter the name of the printer where you want the bootstraps to be printed in the Printer attribute of the Groups resource.
If your Backup server belongs to a group that is not enabled or does not belong to any group, Backup automatically saves the server's bootstrap information with each group that is backed up. If this is the case, you receive the following message in the savegroup completion report:
jupiter: index Saving server index because server is not in an active group |
This is a safety measure to help avoid a long recovery process in the event of a system disaster. You should, as soon as possible, configure the Client resource for the server to include it in an active backup group.
If you installed Backup on more than one server and used the same Backup enabler code for them all, you receive the messages similar to the following in your savegroup completion mail:
--- Unsuccessful Save Sets --- * mars:/var save: error, copy violation - servers `jupiter' and `pluto' have the same software enabler code, `a1b2c3d4f5g6h7j8' (13) * mars:/var save: cannot start a save for /var with NSR server `jupiter' * mars:index save: cannot start a save for /usr/nsr/index/mars with NSR server `jupiter' * mars:index save: cannot start a save for bootstrap with NSR server `jupiter' * mars:index save: bootstrap save of server's index and volume databases failed |
To successfully rerun the backup, you must issue the nsr_shutdown command on each server, remove the Backup software from the extra servers, and then restart the Backup daemons on the server where you want the backups to go.
If you receive the following error message when you attempt to start the graphical administration interface with nwadmin from a client machine, it means that the client is not authorized to display Backup:
Xlib: connection to "mars:0.0" refused by server Xlib: Client is not authorized to connect to Server Xview error: Cannot open display on window server: mars:0.0 (Server package) |
To correct the situation, perform the following steps: