Solstice Backup 5.1 Administration Guide

Appendix C Troubleshooting

If you have a problem with Backup, or if the product does not work the way you expect, use the information in this appendix to diagnose your problem. The information in this appendix covers:

Information to Gather Before You Call Technical Support

If the solutions in this appendix do not solve the problem, be prepared to provide the following information when you call SunSoft Technical Support:

The software version of Backup.
The version of operating system that you are running. For Solaris, you can determine this with the uname -a command.
Your hardware configuration.
Information on your devices and other SCSI IDs. For Solaris, use the /etc/LGTOuscsi/inquire command as root to obtain the required information.
If you are using an autochanger, the type of connection (SCSI or RS-232). Also, provide the version of the autochanger driver you are using. For Solaris, you can determine this from the output of pkginfo -x SUNWsbus2.

You should also be able to relate the following:

How to reproduce the problem.
The exact error messages.
How many times you have seen the problem.
Whether the Backup command was successful before you made any changes and, if so, the changes you made.

Backup and Recover

This section explains how to troubleshoot various problems you might encounter with backup and recover operations.

Checking the Backup Daemons

If you have trouble starting Backup, the daemons may not be running properly. To determine whether the required daemons are running, enter the one of the following commands at the shell prompt:

# ps -aux | grep nsr

or:

# ps -ef | grep nsr

You should receive a response similar to the following:

12217 ?        S  0:09 /usr/sbin/nsrexecd -s jupiter
12221 ?        S  2:23 /usr/sbin/nsrd
12230 ?        S  0:00 /usr/sbin/nsrmmdbd
12231 ?        S  0:01 /usr/sbin/nsrindexd
12232 ?        S  0:00 /usr/sbin/nsrmmd -n 1
12234 ?        S  0:00 /usr/sbin/nsrmmd -n 2
12235 ?        S  0:00 /usr/sbin/nsrmmd -n 3
12236 ?        S  0:00 /usr/sbin/nsrmmd -n 4
12410 pts/8    S  0:00 grep nsr

If the response indicates that the daemons are not present, start the Backup daemons:

# /etc/init.d/networker start

Backup of Clients Fails to Stop

During a backup, you attempt to stop the process by clicking Stop in the Group Control window. This should stop the process for all clients in the selected group, but sometimes a client is missed. You then see messages that indicate the server is still busy.

To resolve the problem, on the client machine, determine which clients still have a save process running by using one of the following commands:

# ps -aux | grep save

or:

# ps -ef | grep save

This command returns a process identification number (pid) for each process associated with save. Enter the following command to stop the save process for each pid:

# kill -9 pid

No Notification of Client File Index Size Growth

Backup does not notify you when a client file index is getting too large. You should monitor the system regularly to check the size of client file indexes. See "Index Management " for information on how to manage the Backup client file indexes.

Media Position Errors Encountered When Auto Media Verify Is Enabled

When you enable Auto Media Verify for a pool, Backup verifies the data written to volumes from the pool while saving. This is done by reading a record of data written to the media and comparing it to the original record. Media is verified after Backup finishes writing to the volume, which might occur when a volume becomes full or when Backup no longer needs the volume for saving data.

To verify media, nsrmmd must reposition the volume to read previously written data. It does not always succeed in the first attempt. These warning messages appear in the message display in the Backup administration program (nwadmin):

media warning: /dev/rmt2.1 moving: fsr 15: I/O error
media emergency: could not position jupiter.007 to file 44, record
16

No action is required. Backup continues to attempt to find the proper position. If Backup can find the correct position, media verification succeeds and a successful completion message appears.

media info: verification of volume "jupiter.007" volid 30052
succeeded.

In this case, ignore the earlier messages because they only indicate that Backup had problems finding the desired position on the media. If the problem is serious, media verification fails and a subsequent message gives the reason for the failure.

`PACKET RECEIVE BUFFER` and `NO ECB` Counters Increase

When your server is waiting for a tape to be mounted or is in the process of changing an autochanger volume, you see the PACKET RECEIVE BUFFER and NO ECB counters increase on a NetWare client.

To resolve this problem, use the nsr_shutdown command to shut down the Backup server. Then, restart Backup manually. See "Checking the Backup Daemons" for commands to restart manually.

Backup Not Found in Expected Location for Solaris Client

On Solaris, Backup executables are installed by default in /usr/sbin/nsr. If you start a group backup on a Backup server that does not have /usr/sbin/nsr in the search path for root, the backup fails on a client that has its Backup executables in /usr/sbin/nsr. This is because the savefs command is not in the search path.

The best solution is to set the Executable Path hidden attribute for a client that has this problem. To set the Executable Path, display the Clients attribute in details view and enter the path of the executables, /usr/sbin/nsr, in the Executable Path attribute.

Another solution is to modify the search path for root on the Backup server to include /usr/sbin/nsr even if it does not exist locally.

The `scanner` Program Marks a Volume Read-Only

When you use the scanner program to rebuild the index of a backup volume, the scanner program marks the volume read-only.

This is a safety feature that prevents the last save set on the backup volume from being overwritten. To write to the media without marking it read-only, use the nsrmm -o command:

# nsrmm -o notreadonly volume-name

Index Recovery to a Different Location Fails

Suppose you attempt to recover indexes to a directory other than the one where they were originally located, then receive the following error message:

WARNING: The on-line index for `client-name' was NOT fully recovered.
There may have been a media error. You can retry the recover, or
attempt to recover another version of the `client-name' index.

Do not attempt to recover the indexes to a different directory. After the indexes have been recovered to their original location, you can move them to another directory.

Because the indexes are holey files, using the UNIX cp command creates a file that consumes more disk space than the original file. To move the indexes, invoke the following command as root from within the /nsr/index directory:

# uasm -s -i client-index-directory-name | (cd target-directory; uasm -r)

Potential Cause for Client Alias Problems

If you encounter any of the following situations, a client alias problem might be the cause.

You receive the following error messages: "No client resource for..." or "Client xxx cannot back up client yyy files."
A client machine always performs full backups, regardless of the level of the scheduled backup.
It appears that automatic index management according to the browse and retention policies does not occur. This is indicated by the filesystem containing the indexes continuously increasing in size.
In /nsr/index, the directory that contains the indexes, there are two directories for the same client using two different client names.

A client alias change is needed for the following situations:

Machines that have two or more network interfaces
Sites that mix short and "fully qualified" hostnames for the same machines; for example, jupiter and jupiter.oak.com
Sites using both YP (NIS) and DNS

Use the Backup administration program or nsradmin to edit the client resource for clients with this problem. Add all network names for this host to the Aliases attribute.

Caution -

Do not put aliases that are shared by other hosts on this line.

Illegal Characters to Avoid in Configurations

When you upgrade from earlier versions of Backup, the configuration names of label templates, directives, groups, policies, and schedules that include the following special characters are no longer allowed:

/\\*?[]()$!^;'\"`~><&|{}

This change was made because volume labels, directives, groups, policies, and schedules are often passed as command line options to various Backup programs.

During installation of Backup, these characters in your current configuration names are replaced with an underscore (_) in the resources where they were originally created.

However, in the Clients resource where these configurations are applied, Backup automatically replaces the selected configuration with the preexisting Default configuration.

You need to reselect the configurations whose names have changed and reapply them to the individual clients.

The `scanner` Program Requests an Entry for Record Size

If you use the scanner program with the -s option but without an -i or -m option, and you receive the message:

please enter record size for this volume ('q' to quit) [xx]

The number in the bracket [xx] is the entry from the last query.

The scanner command always rewinds the tape and reads the volume label to determine the block size. If the volume label is corrupted or unreadable, you see a message prompting you to enter the block size (in kilobytes).

Type in the block size; it must be an integer equal to or greater than 32. If you enter an integer that is less than 32, you receive the following message:

illegal record size (must be an integer >=32)

Failed Recover Operation Directly after New Installation

If you attempt to start the nwrecover program immediately after installing Backup for the first time on your system, you receive the error message nwrecover: Program not found.

To save disk space, Backup delays the creation of the client index until the first backup is completed. The nwrecover program cannot recover data until the client index has entries for browsing. To avoid the problem, perform a Backup backup on the client.

Recovering Files from an Interrupted Backup

If you terminate a backup by killing the Backup daemons, you cannot recover the files because the media database is not updated when the daemons die. Consequently, Backup does not know which volumes the requested files reside on.

Backup of a New Client Defaults to a Level Full

The first time you back up a new client, you receive the following message:

mars:/usr no cycles round in media db; doing full save.

In this example, the /usr filesystem on the mars client has no full saves listed in the media database. Therefore, regardless of the backup level selected for the client's schedule, Backup performs a full backup. This feature is important because it enables you to perform disaster recoveries for the client.

You may also receive this message if the server and client clocks are not synchronized. To avoid this, make sure that the Backup server and the client are in the same time zone and have their clocks synchronized.

Renamed Clients Cannot Recover Old Backups

Backup maintains a client file index for every client it backs up. If you change the name of the client, the index for that client is not associated with the client's new name and you cannot recover files backed up under the old client name. To recover previous backup data for a newly renamed client, see "How To Recover Previous Backup Data Under a New Client Name".

How To Recover Previous Backup Data Under a New Client Name

Delete the Client resource configured for the old client name.

Create a new Client resource for the new client name.

Shut down the Backup daemons.
# nsr_shutdown

Delete the index directory that was automatically created for the new client.

If you simply copy the new client index over the old client index directory, the result is a nesting of the new client index inside the old client index directory.

Use the mv command to rename the old client's file index directory.
# mv /nsr/index/old-client /nsr/index/new-client

Disk Label Errors

If you receive the error message "No disk label," you may have incorrectly configured a nonoptical device as an optical device within Backup. Verify that the Media Type attribute in the Devices resource matches the expected media for your device, and make corrections if needed.

Errors from Unsupported Media in HP Tape Drives

Certain Hewlett-Packard tape drives can only read 4 mm tapes of a specific length. Some, for example, read only 60 meters tapes and do not support the use of 90- or 120- meter tapes. To determine the type of tape supported by your HP drive, consult the hardware manual provided with the drive.

If you attempt to use unsupported media in an HP tape drive, you may encounter the following types of error messages:

When you use the nsrmm or nsrjb command to label the tape:

nsrmm: error, label write, No more processes (5)

When you attempt to use the scanner -i command:

scanner: error, tape label read, No more processes (11)
scanning for valid records ...
read: 0 bytes
read: 0 bytes
read: 0 bytes

Cannot Print Bootstrap Information

If your server bootstraps are not printed, you may need to enter your printer's name as a hidden attribute in the Groups resource. Access the hidden attributes by selecting Details from the View menu in the graphical administration program (nwadmin) or by selecting the Hidden choice from the Options menu in the cursor-based administration program (nsradmin).

Enter the name of the printer where you want the bootstraps to be printed in the Printer attribute of the Groups resource.

Server Index Saved

If your Backup server belongs to a group that is not enabled or does not belong to any group, Backup automatically saves the server's bootstrap information with each group that is backed up. If this is the case, you receive the following message in the savegroup completion report:

jupiter: index Saving server index because server is not in an
active group

This is a safety measure to help avoid a long recovery process in the event of a system disaster. You should, as soon as possible, configure the Client resource for the server to include it in an active backup group.

Copy Violation

If you installed Backup on more than one server and used the same Backup enabler code for them all, you receive the messages similar to the following in your savegroup completion mail:

--- Unsuccessful Save Sets ---
* mars:/var save: error, copy violation - servers `jupiter' and `pluto'
have the same software enabler code, `a1b2c3d4f5g6h7j8' (13)
* mars:/var save: cannot start a save for /var with NSR server `jupiter'
* mars:index save: cannot start a save for /usr/nsr/index/mars with NSR
server `jupiter'
* mars:index save: cannot start a save for bootstrap with NSR server
`jupiter'
* mars:index save: bootstrap save of server's index and volume
databases failed

To successfully rerun the backup, you must issue the nsr_shutdown command on each server, remove the Backup software from the extra servers, and then restart the Backup daemons on the server where you want the backups to go.

Xview Errors

If you receive the following error message when you attempt to start the graphical administration interface with nwadmin from a client machine, it means that the client is not authorized to display Backup:

Xlib: connection to "mars:0.0" refused by server
Xlib: Client is not authorized to connect to Server
Xview error: Cannot open display on window server: mars:0.0 (Server
package)

To correct the situation, perform the following steps:

From the client machine, invoke the xhost command:
# xhost server-name

Remotely log in to the Backup server and issue the setenv command at the shell prompt.
# setenv DISPLAY client-name:0.0
For command shells other than /usr/bin/csh, enter:
# DISPLAY=client-name:0.0 # export DISPLAY

Client-Server Communications

Many of the problems that Backup users report when they set up and configure Backup are problems with the communications in their networks. This section contains a procedure for testing the communications in a network.

How to Troubleshoot IP Errors

To troubleshoot IP errors, follow these steps:

Document the steps you take and the results, especially error messages, in case you need to contact SunSoft Technical Support.

This enables you to email or fax exact steps and error message text directly to SunSoft.

Set up host tables for Backup clients and Backup servers.

See "How to Set Up Host Tables ".

Disable other name servers to simplify testing.

See "How to Disable Name Servers for Troubleshooting ".

Use the ping command to establish basic connectivity.

See "How to Use ping to Verify Network Connections ".

Use the rpcinfo command to verify that sessions can be established and that portmapping is correct.

See "How to Use rpcinfo to Verify That Sessions Can Be Established".

How to Set Up Host Tables

We recommend that you troubleshoot IP problems using only host tables. Troubleshooting using only host tables does not mean you cannot use your name service, for example, DNS, with Backup. Test using only host tables to determine whether you have Backup installed correctly. After you know Backup works with host tables, you can enable whatever name server you are using.

To configure host tables on a server or client, follow these steps:

On the Backup client, list the client and the Backup servers to which it connects.

For example:

127.0.0.1 localhost loopback
123.456.789.111 client client.domain.com
123.456.789.222 server server.domain.com

On the Backup server, list the Backup server itself and all of its clients.

For example:

127.0.0.1 localhost loopback
123.456.789.111 server server.domain.com
123.456.789.222 client client.domain.com

Use the guidelines in "How to Use ping to Verify Network Connections " to ensure the highest success rate for host table parsing within any operating system.

Notes for host table configuration include:
- Do not use blank lines in the body of your host tables.
- The end of the host table should always contain a blank line.
- The first unremarked entry should always be the loopback line in the exact order and format shown in Steps 1 and 2.
- The last character of each unremarked line should be a space, not a carriage return.
On UNIX platforms, the host tables reside in /etc/hosts.

You can use host tables in addition to DNS where necessary, but it is simplest to temporarily disable DNS for troubleshooting.

How to Disable Name Servers for Troubleshooting

To simplify the troubleshooting of name resolution problems, we recommend disabling services like DNS, WINS, and DHCP. If you have name resolution problems, first configure only the host tables for your machines, then test your backups.

Some common problems you may encounter with DNS, WINS, and DHCP services

The DNS is not configured with a reverse lookup table.
The clients are configured with the wrong IP addresses for DNS or WINS servers.
The DHCP services do not properly update the WINS server with new addresses.

You do not need to disable DNS for your entire network, just for the initial setup of the Backup clients and the Backup server you want to test. Only disable the ability of a client to obtain IP naming information from a DNS server. Typically, you do not need to disable the DNS server itself.

To disable the DNS server on most UNIX platforms, rename the file /etc/resolv.conf and reboot.

For a Solaris system, instead of renaming resolv.conf, you can set up the IP name search order so that the host table is searched before DNS.

To set up the IP name search order, follow these steps:

Edit the /etc/nsswitch.conf file and verify that the /etc/resolv.conf file exists.

Set the host file to be first in search order, with DNS second and NIS last.

For example:
hosts: files [NOTFOUND=continue] DNS [NOTFOUND=continue] nis

How to Use `ping` to Verify Network Connections

After you create the host tables, test them with ping.

On the Backup client:

ping the client short name (hostname) from the client
ping the client long name (hostname plus domain information) from the client
ping the client IP address from the client
ping the server short name from the client
ping the server long name from the client
ping the server IP address from the client

The following example shows pinging the client short name and client long name from a Backup client called mars in the oak domain:

# ping mars
# ping mars.oak.com

On the Backup server (use just the steps marked with an asterisk (*) if the server is the only client):

ping the server short name from the server *
ping the server long name from the server *
ping the server IP address from the server *
ping the client short name from the server
ping the client long name from the server
ping the client IP address from the server

How to Use `rpcinfo` to Verify That Sessions Can Be Established

If ping is successful and backup problems still exist, you can also test with rpcinfo. Because Backup relies heavily on mapping of ports, use rpcinfo to test the operation of the portmapper. Using ping tests the connection up to the network layer in the OSI model; rpcinfo checks for communication up to the session layer.

Use the same tests with rpcinfo as with ping. Run just the steps marked with an asterisk (*) if the server is the only client.

For rpcinfo to be used successfully, the machine whose hostname you enter on the command line must have a portmapper running. In most cases, SunSoft portmappers are compatible with fully functional portmappers from other vendors (this is called a third-party portmapper). If you are using a product that provides its own portmapper, we recommend not loading the third-party portmapper until you have verified that Backup works with the rest of your environment. This process lets you test portmapper compatibility without adding other unknowns.

On Solaris, the rpcbind daemon must be running. The rpcinfo utility is part of the operating system.

The syntax for using rpcinfo to display ports using TCP is:

rpcinfo -p hostname

Substitute the long name and short name for the variable hostname, just like for ping.

You can view other rpcinfo command line options by typing rpcinfo at the command line. Notes on the rpcinfo command and its error messages are available in the UNIX man page for rpcinfo. Repeat rpcinfo using all the locations and all the iterations listed in this document for ping.

When rpcinfo runs successfully, the output is a list of port numbers and names. For troubleshooting, we are only interested in the exact text of any error messages. Typical successful responses have the following format:

rpcinfo for mars
program vers proto   port
100000    2   tcp    111  portmapper
100000    2   udp    111  portmapper
390103    2   tcp    760
390109    2   tcp    760
390110    1   tcp    760
390103    2   udp    764
390109    2   udp    764
390110    1   udp    764
390113    1   tcp   7937
390105    5   tcp    821
390107    4   tcp    819
390107    5   tcp    819
390104  105   tcp    822

Verifying Firmware for Switches and Routers

If you are using switches or routers from any vendor, make sure that the switch or router firmware is dated after August 1995 (wherever they exist on your network) to ensure that RPC (Remote Procedure Call) traffic is handled properly. Most of the switch and router vendors with whom we have worked have significantly improved their handling of RPC traffic since August 1995.

Naming Requirements

Backup UNIX clients, release 4.2 and later, use the servers file in the /nsr/res subdirectory to determine whether a Backup server is authorized to back up the client's data. If you don't have the servers file, you can create it in /nsr/res using your preferred editor.

Make sure the servers file on a client contains both the short name and long name of the server you want to use to back up that client's data. For example, the servers file on a Backup client would contain the following names for a Backup server named mars in the oak.com domain:

mars
mars.oak.com

In the Clients resource, list both the short name and the long name, plus any other applicable aliases for each client, in the Alias attribute.

Binding to Server Errors

Backup follows the client/server model, where servers provide services to the client through the RPC. These services reside inside of long-lived processes, known as daemons.

For clients to find these daemons, you must register the daemons with a registration service. When the daemons start up, they register themselves with the registration service provided by the portmapper.

Backup servers provide a backup and recovery service. They receive data from clients, store the data on backup media, and retrieve it on demand. If the Backup daemons are not running and a Backup service is requested, you receive the following messages in your savegroup completion mail:

Server not available
RPC error, remote program is not registered

These messages indicate that the Backup daemons nsrd, nsrexecd, nsrindexd, nsrmmd, and nsrmmdbd may not be running. To restart the daemons, become root and enter the following command at the shell prompt:

# /etc/init.d/networker start

Saving Remote Filesystems

You may receive the following error messages in your savegroup completion mail when a backup for a remote client fails:

All: host hostname cannot request command execution
All: sh: permission denied

The first message means that the nsrexecd daemon on the client is not configured to allow the server to back up its files. The second message means that the nsrexecd daemon is not currently running on the client.

To resolve these problems, make sure that the nsrexecd daemon is running on the client, and that the server's hostname is listed in the boot-time file. The boot-time file is automatically generated before the installation script is completed, and takes your responses to the query for the names of all the servers, in order of precedence, that can contact a client for backups. Table C-1 lists the location for the boot-time file.

Refer to the nsrexecd(1m) man page for detailed information about the nsrexecd daemon.

Table C-1 Boot-time File Locations


Operating System	Boot-time file
Solaris	`/etc/rc2.d/S95networker`
SunOS 4.1.x	`/etc/rc.local`

Remote Recover Access Rights

You can control client recover access by configuring the Client resource. The Remote Access list displays the usernames that have recover access to the client's save sets. You can add or remove usernames depending on the level of security the files require.

The following users have permission to recover any files on any client, regardless of the contents of the Remote Access list:

Root
Operator
Member of the operator group

Other users can only recover files for which they have read permission, relative to the file mode and ownership at the time that the file was backed up. Files recovered by a user other than root, operator, or the operator group are owned by that user.

Autochanger Operations

This section explains how to resolve problems encountered with the use of an autochanger with Backup.

Testing the Device Driver Installation

After you install the Backup device driver software, use the lusdebug program to verify the server connection and the jbexercise program to test the autochanger. Use the value of the control port assigned to your autochanger (for example, scsidev@0.6.0) for control-port in the following commands:

# lusdebug control-port 0
# jbexercise -c control-port -m model

If these commands fail, or if you receive error messages, see the following sections for information on the possible cause and solution.

The `lusdebug` Command Fails

If the lusdebug command fails, review these suggestions to identify the potential problems and their solutions:

Issue the sjiinq command as root, and provide the control-port as an argument. You should receive a message similar to the following:

scsidev@0.6.0:<EXABYTE EXB-10i EXB-10i >

Verify that the information supplied by the message is correct.

If the vendor and model names are incorrect, you supplied the wrong SCSI ID as the device ID during the driver installation. The installation script asks for the SCSI ID of the robot mechanism, not the tape drive.

Deinstall the device driver and then reinstall it, and supply the correct address for the autochanger (robotic arm). Make sure that each device on the SCSI bus has a different SCSI ID address.
Inspect the following items to verify that the autochanger is properly connected:
- Make sure all the connectors on the SCSI bus are firmly connected.
- Make sure none of the SCSI cables are defective.
- Verify that the SCSI bus is properly terminated and is within the length specified by ANSI SCSI-II specifications (ANSI X3.131-1994).
  
  Both ends of the SCSI bus must be terminated with the appropriate resistors to be properly terminated. Single-ended SCSI buses are 220 ohms to +5 VDC, 330 ohms to ground. Differential terminators have a 122-ohm characteristic impedance (-5 VDC to +5 VDC). The ends of the SCSI bus are considered to be the last SCSI device at either end of the bus, where both peripheral devices and systems are considered as peer SCSI devices.
  
  Additional termination (termination placed on devices not at either end of the SCSI bus) is ill-advised. Additional termination causes the hardware bus drivers on each device on the bus to have to work harder (for example, out of the range of their nominal specification) to affect signal transitions. As a result, they may not be able to meet the timing requirements for some signal transitions.
  
  SCSI bus length limitations affect the quality of the signals; thus, the likelihood of transmission errors on the bus. For Single-ended SCSI buses (the most prevalent), the length is 6 meters, unless FAST SCSI devices are attached and in use, in which case the length limit is 3 meters. This length includes the length of the bus as it is within a device as well as the length of external cables. A reasonable rule of thumb for internal length is to assume 1 meter of internal bus length for the workstation chassis and about 0.25 meters per device for external peripheral boxes.
  
  Differential option SCSI buses can be much longer (due to the electrical differences from Single-Ended). Allow for a maximum of 25 meters. Never mix Differential and Single-Ended devices.
Check to see whether an old autochanger driver is still installed. This can be the AAP driver shipped with earlier versions of Backup, or release 1.1 or earlier of the Parity driver, which only supported SCSI bus 0.
Check the SCSI IDs on all devices attached to the same bus; make sure that none are the same. If two devices have the same target ID, you may experience the following symptoms: SCSI bus reset errors appear in system log files, the machine does not boot, and the probe-scsi boot prompt command on SPARC systems hangs.
If the sensor that verifies whether the tape drive door is open is out of place, follow the instructions provided with your autochanger hardware to determine the problem, or contact your hardware vendor.
If the autochanger is in sequential mode, change the setting to random mode.

If none of these suggestions resolve the problem, contact SunSoft Technical Support. You need to provide the information described in "Information to Gather Before You Call Technical Support ", and the captured output of the sjiinq, and sjirjc programs. See Appendix B, Command Line Reference Utilities for information on the jbexercise, sjiinq, and sjirjc programs, or refer to the associated man pages for each program.

The `jbexercise` Command Fails

If the jbexercise command fails, review the following list of suggestions to identify potential problems and their solutions:

The jbexercise program prompts you for a no-rewind device name (for example, on Solaris, /dev/rmt/0mbn). Verify that you have supplied the correct device pathname for the tape drive. The device name must belong to a tape drive in the autochanger, not the autochanger itself. If you receive the following error message, you did not enter a no-rewind device name:
device not ready
Make sure that the tape drive for which you enter the pathname works. Insert a volume into the drive and perform the following tests:
- Use the tar command to copy a small file to the volume.
- Verify more extensive operations by issuing the tapeexercise command.
If these tests fail, the tape drive is not functioning. Contact your hardware vendor for further information on how to configure your tape drive to work with your system.

If none of these suggestions resolve the problem, contact SunSoft Technical Support. You need to provide the information described in "Information to Gather Before You Call Technical Support " and the captured output of the jbexercise, sjiinq, and sjirjc programs. See Appendix B, Command Line Reference Utilities for information on the jbexercise, sjiinq, and sjirjc programs, or refer to the associated man pages for each program.

Autodetected SCSI Jukebox Option Causes Server to Hang

If you install an Autodetected SCSI jukebox using jb_config and the server hangs, the following workaround is recommended:

Select the jb_config option that installs an SJI jukebox. A list of jukeboxes is displayed.

Enter the number that corresponds to the type of jukebox you are installing.

Proceed with jb_config until you receive the following message:
Jukebox has been added successfully.

Autochanger Inventory Problems

The autochanger inventory becomes outdated, which means that Backup cannot use the autochanger, if any of the following situations occur:

The media is manually ejected from the autochanger drive.
The media is removed from the autochanger.
The autochanger door is opened.

To make the autochanger usable again, perform the following steps:

Verify that the media cartridge is correctly installed in the autochanger and that the autochanger door is closed.

Become root on the Backup server.

Reset the autochanger.
# nsrjb -Hv

Perform an inventory.
# nsrjb -Iv
After the inventory operation is finished, Backup can once again use the autochanger.

For complete information on the use of the nsrjb command, refer to the nsrjb(8) man page or see Chapter 7, Autochanger Module.

Destination Component Full Messages

The message "Destination component full" usually is the result of a manual operation performed on the autochanger, for example, physically unloading the tape drive by means of the buttons on the autochanger rather than using Backup to unmount the volume. This operation causes Backup to lose track of the status of the media in the autochanger.

To resolve the problem, use Backup command nsrjb -H to reset the autochanger.

Tapes Are Not Filled to Capacity

You might encounter situations where Backup does not fill tapes to capacity. For example, a tape with an advertised capacity of 4000 MB can be marked full by Backup after only 3000 MB of data have been written to it.

To enable Backup to use the tape capacity to its fullest, select the highest density device driver appropriate for your device. When a tape is labeled, Backup writes to it at the highest density supported by your device.

There are several reasons for situations in which Backup appears to fill tapes prematurely:

Write errors occur during a backup.

Most tape drives try to read after a write operation to verify that the tape was written correctly, and retry if it was not. A write error indicates either end of tape or a read error. At any tape error, Backup marks the tape full.

To prevent tape write errors, clean your tape drive regularly and use only data-quality tapes. If cleaning the drive does not seem to help, make sure that the device driver is properly configured, any necessary switch settings on the tape drive are set to the manufacturer's specifications, all cabling is secure, and other potential SCSI problems have been addressed.
Backup filemarks take up space on the tape.

Backup periodically writes filemarks to facilitate rapid recovery of data. These filemarks consume varying amounts of tape depending on the type of tape drive on some drives, filemarks can consume several MB. The number of filemarks Backup writes to tape is a function of how many save sets are on the tape. Many small save sets require more filemarks than a few larger ones.
Tape capacities vary from tape to tape.

Tape capacities are not constant from tape to tape. Two apparently identical tapes from the same vendor can vary significantly in capacities. This can cause problems if you copy one very full tape to another, especially if the destination tape holds less data than the source tape.
Data compression affects the tape capacity.

If you use compression on your tape drive, you cannot predict the effect on tape capacities. A compressing drive can provide twice the capacity of a non-compressing drive. It could be far less or far more, depending on the kind of data being backed up. For example, if a noncompressing drive writes 2 GB of data to a specific tape, the compressing drive could write 10 GB, 2 GB, 5 GB, or some other unpredictable amount of data.
Length of tape.

Be sure to verify tape lengths. A 120-meter DAT tape holds more data than a 90-meter DAT tape, and without examining the printed information on the tape cassette carefully, the two tapes can appear identical.

For Solaris, if your tape devices are not directly supported by Sun Microsystems, you will need to recreate your entries in the st.conf file. If you need assistance with this, contact Sun Technical Support.

Server Cannot Access Autochanger Control Port

The control port controls the autochanger loading mechanism. Your autochanger's hardware installation manual should have instructions on how to verify whether the control port is properly connected. If you cannot determine whether the control port is working, contact the autochanger vendor for assistance.

Backup Archive and Retrieve

This section explains how to troubleshoot various problems you may encounter with Backup archive and retrieve.

Remote Archive Request From Server Fails

If you cannot perform a remote archive request of a workstation from the Backup server, the archive client's user name (for example, root) may not be listed in that client's Archive Users attribute in the Clients resource.

You can also grant Backup administrator privileges for root@client-system in the Administrator attribute in the Server resource. Granting administrator privileges creates a potential security issue, since Backup administrators can recover and retrieve data owned by other users on other clients.

Multiple Save Sets Appear as a Single Archive Save Set

When you combine multiple save sets in an archive, such as /home and /usr, they end up in a single archive save set, which appears as "/" in the Archives list in the Backup Retrieve program (nwretrieve).

If you want save sets to appear separately during retrieve, archive them separately.

Cloned Archives Do Not Appear in Backup Retrieve Window

When you search for an annotation in the Backup Retrieve program (nwretrieve), the Archives attribute does not display archive clones.

To locate the clones, start the query without specifying a Search Annotation attribute. If that query returns too many archives, you can use mminfo to locate the archive clone with the same save set ID (ssid) as the archive you want.

Wrong Archive Pool Is Selected

If you create multiple archive pools, the one selected for archive is not the default archive pool. When you create multiple archive pools, the last one created is the one selected for archive.

Second Archive Request Does Not Execute

If you create two archive requests with the same name, only the first request executes. To avoid the problem, do not create two archive requests with the same name; the newer one will never execute.

Command Line Archive Does Not Start Immediately

If you run nsrarchive from the command line, the archive does not start immediately after you type the annotation and then Control-D to start the archive. Wait a while; there is a delay before the archive starts. Do not press Control-D multiple times.

Empty Annotations in Retrieve List

You may encounter empty annotations in the retrieve list when you search for annotations using a search string.

The Backup Archive program does not allow you to enter a null annotation string. By contrast, older versions of the Backup Archive software installed on DOS, Windows, and NetWare lack an annotation feature. As a consequence, the annotations for save sets archived with the older software are empty strings in the retrieve list.

Diagnostic Tools

A variety of diagnostic tools are available as operating system services and as part of the Backup product. This section describes some diagnostic tools that are useful with Backup.

Diagnostic Report

Backup includes a script called nsr_support that generates an exhaustive diagnostic report. Typically, you run nsr_support only at the request of SunSoft Technical Support. Redirect the output of the script to a file and then e-mail the file for analysis. To run the script and redirect the output, become root on the system and enter the nsr_support command at the shell prompt:

# nsr_support > /temp/filename

For further information about the nsr_support command, refer to the nsr_support man page.

Communications Tests

To verify that communications sessions can be established, test with ping and rpcinfo, which are tools provided with the operating system software.

Because Backup relies heavily on mapping of ports, use rpcinfo to test the operation of the portmapper. Using ping tests the connection up to the network layer in the OSI model; rpcinfo checks for communication up to the session layer. For instructions on using ping and rpcinfo, see "Client-Server Communications ".

Contact SunSoft Technical Support for more tools on testing communications.