System Administration Guide

Part XVI Troubleshooting Solaris 2.6 Software Problems

This part provides instructions for troubleshooting Solaris 2.x software problems. This part contains these chapters.

Chapter 68, Troubleshooting Software Problems (Overview)

Provides overview information about troubleshooting common software problems and instructions for troubleshooting a system crash.  

Chapter 69, Generating and Saving System Crash Information

Provides step-by-step instructions for saving crash dumps and customizing system error logging.  

Chapter 70, Troubleshooting Miscellaneous Software Problems

Provides problem scenarios and possible solutions for general software problems such as a hung system or a system that won't boot.  

Chapter 71, Troubleshooting File Access Problems

Provides solutions for solving common file access problems such as incorrect command search paths and file permissions.  

Chapter 72, Troubleshooting Printing Problems

Provides solutions for solving common printer problems such as no output or incorrect output.  

Chapter 73, Troubleshooting File System Problems

Provides specific fsck error messages and corresponding solutions for solving file system-related problems.

Chapter 74, Troubleshooting Software Administration Problems

Provides specific error messages and possible solutions for problems encountered when adding or removing software packages.  

Chapter 68 Troubleshooting Software Problems (Overview)

This chapter provides a general overview of troubleshooting software problems, including information on troubleshooting system crashes and viewing system messages.

This is a list of information in this chapter.

Where to Find Software Troubleshooting Tasks

Use these references to find step-by-step instructions for troubleshooting software problems.

Troubleshooting a System Crash

If a system running the Solaris operating environment crashes, provide your service provider with as much information as possible--including core files.

What To Do if The System Crashes

The most important things are:

  1. Write down the system console messages.

    If a system crashes, making it run again may seem like your most pressing concern. However, before you reboot the system, examine the console screen for messages. These messages may provide some insight about what caused the crash. Even if the system reboots automatically and the console messages have disappeared from the screen, you may be able to check these messages by viewing the system error log file that is generated automatically in /var/adm/messages (or /usr/adm/messages). See "How to View System Messages" for more information about viewing system error log files.

    If you have frequent crashes and can't determine their cause, gather all the information you can from the system console or the /var/adm/messages files, and have it ready for a customer service representative to examine. See "Troubleshooting a System Crash" for a complete list of troubleshooting information to gather for your service provider.

    See Chapter 70, Troubleshooting Miscellaneous Software Problems, if the system fails to reboot successfully after a system crash.

  2. Synchronize the disks and reboot.


    ok sync
    

    See Chapter 70, Troubleshooting Miscellaneous Software Problems if the system fails to reboot successfully after a system crash.

  3. Attempt to save the crash information written onto the swap area by running the savecore command.


    # savecore
    

See Chapter 69, Generating and Saving System Crash Information for information about saving crash dumps automatically.

Gathering Troubleshooting Data

Answer the following questions to help isolate the system problem. Use "Troubleshooting a System Crash Checklist" for gathering troubleshooting data for a crashed system.

Table 68-1 Identifying System Crash Data

Question 

Description 

Can you reproduce the problem?

This is important because a reproducible test case is often essential for debugging really hard problems. By reproducing the problem, the service provider can build kernels with special instrumentation to trigger, diagnose, and fix the bug. 

Are you using any third-party drivers?

Drivers run in the same address space as the kernel, with all the same privileges, so they can cause system crashes if they have bugs. 

What was the system doing just before it crashed?

If the system was doing anything unusual like running a new stress test or experiencing higher-than-usual load, that may have led to the crash. 

Were there any unusual console messages right before the crash?

Sometimes the system will show signs of distress before it actually crashes; this information is often useful. 

Did you add any tuning parameters to the /etc/system file?

Sometimes tuning parameters, such as increasing shared memory segments so that the system tries to allocate more than it has, can cause the system to crash. 

Did the problem start recently?

If so, did the onset of problems coincide with any changes to the system, for example, new drivers, new software, different workload, CPU upgrade, or a memory upgrade. 

Troubleshooting a System Crash Checklist

Use this checklist when gathering system data for a crashed system.

Item 

Your Data 

Is a core file available? 

 

Identify the operating system release and appropriate software application release levels. 

 

Identify system hardware. 

Include prtdiag output from sun4d systems.

 

Are patches installed? If so, include showrev -p output.

 

Is the problem reproducible? 

 

Does the system have any third-party drivers? 

 

What was the system doing before it crashed? 

 

Were there any unusual console messages right before the system crashed? 

 

Did you add any parameters to the /etc/system file?

 

Did the problem start recently? 

 

Viewing System Messages

When a system crashes, it may display a message on the system console like this:


panic: error message

where error message is one of the panic error messages described in crash(1M).

Less frequently, this message may be displayed instead of the panic message:


Watchdog reset !

The error logging daemon, syslogd, automatically records various system warnings and errors in message files. By default, many of these system messages are displayed on the system console and are stored in /var/adm (or /usr/adm) or . You can direct where these messages are stored by setting up system logging. See "How to Customize System Message Logging" for more information. These messages can alert you to system problems, such as a device that is about to fail.

The /var/adm directory contains several message files. The most recent messages are in /var/adm/messages (and in messages.0), and the oldest are in messages.3. After a period of time (usually every ten days), a new messages file is created. The messages.0 file is renamed messages.1, messages.1 is renamed messages.2, and messages.2 is renamed messages.3. The current /var/adm/messages.3 is deleted.

Because /var/adm stores large files containing messages, crash dumps, and other data, this directory can consume lots of disk space. To keep the /var/adm directory from growing too large, and to ensure that future crash dumps can be saved, you should remove unneeded files periodically. You can automate this task by using crontab. See "How to Delete Crash Dump Files" and Chapter 59, Scheduling System Events (Tasks) for more information on automating this task.

How to View System Messages

Display recent messages generated by a system crash or reboot by using the dmesg command.


$ dmesg

Or use the more command to display one screen of messages at a time.


$ more /var/adm/messages

For more information, refer to dmesg(1M).

Example--Viewing System Messages

The following example shows output from the dmesg command.


$ dmesg
Nov 12 16:53
SunOS Release 5.6 Version A [UNIX(R) System V Release 4.0]
copyright (c) 1983-1997, Sun Microsystems, Inc.
DEBUG enabled
WARNING: cannot load psm xpcimach
mem = 32376K (0x1f9e000)
avail mem = 25247744
root nexus = i86pc
Unable to install/attach drive `isa'
eisa0 at root
NOTICE: eisa: DMA buffer-chaining not enabled
NOTICE: IN i8042_acquire
NOTICE: out i8042_acquire
NOTICE: IN i8042_release
NOTICE: about to enable keyboard
NOTICE: out i8042_release
          .
          .
          .

Customizing System Message Logging

You can capture additional error messages that are generated by various system processes by modifying the /etc/syslog.conf file. By default, /etc/syslog.conf directs many system process messages to the /var/adm message files. Crash and boot messages are stored here as well. To view /var/adm messages, see "How to View System Messages".

The /etc/syslog.conf file has two columns separated by tabs:

facility.level ...
action

facility.level

A facility or system source of the message or condition. May be a comma-separated listed of facilities. Facility values are listed in Table 68-2. Alevel, indicates the severity or priority of the condition being logged. Priority levels are listed in Table 68-3.

action

The action field indicates where the messages are forwarded. 

The following example shows sample lines from a default /etc/syslog.conf file.


user.err					/dev/console
user.err					        /var/adm/messages
user.alert					     `root, operator'
user.emerg					     *

The most common error condition sources are shown in Table 68-2. The most common priorities are shown in Table 68-3 in order of severity.

Table 68-2 Source Facilities for syslog.conf Messages

Source 

Description 

kern

The kernel 

auth

Authentication 

daemon

All daemons 

mail

Mail system 

lp

Spooling system 

user

User processes 


Note -

Starting in the Solaris 2.6 release, the number of syslog facilities that can be activated in the /etc/syslog.conf file is unlimited. In previous releases, the number of facilities was limited to 20.


Table 68-3 Priority Levels for syslog.conf Messages

Priority 

Description 

emerg

System emergencies 

alert

Errors requiring immediate correction 

crit

Critical errors 

err

Other errors 

info

Informational messages 

debug

Output used for debugging 

none

This setting doesn't log output  

How to Customize System Message Logging

  1. Become superuser.

  2. Using the editor of your choice, edit the /etc/syslog.conf file, adding or changing message sources, priorities, and message locations according to the syntax described in syslog.conf(4) .

  3. Exit the file, saving the changes.

Example--Customizing Message System Logging

The following /etc/syslog.conf lines are provided by default during the Solaris installation process.


user.err					/dev/console
user.err					        /var/adm/messages
user.alert					     `root, operator'
user.emerg					     *
 

This means the following user messages are automatically logged:

Chapter 69 Generating and Saving System Crash Information

This section contains information about enabling and disabling crash dumps, and how to view and collect system messages.

This is a list of the step-by-step instructions in this chapter.

System Crashes

System crashes can occur due to hardware malfunctions, power failures, I/O (input/output) problems, and software errors. If a software glitch, such as a fatal kernel error caused by an operating system bug, causes a system to crash, the system writes an image of its physical memory into a core file at the end of the swap slice of the disk. This file is a snapshot of the state of the kernel, including its program text, data, and control structures, captured at the time of the crash.

Crash Dump (or Core) Files

The crash dump or core file written when a UNIX system crashes can provide clues about what caused the crash if it is examined by an experienced kernel debugger. However, when a UNIX system reboots after a crash, it generally overwrites any core file that may have been produced--unless you have enabled the system to save the core file in a crash dump file.

See "Using Crash Dumps Task Map" for detailed instructions on how to enable a system to save crash dump files. Crash dump files can be very big, so do not retain them longer than necessary.

Saving Crash Dumps

You can examine the control structures, active tables, memory images of a live or crashed system kernel, and other information about the operation of the kernel by using the crash utility. Using crash to its full potential requires a detailed knowledge of the kernel, and is beyond the scope of this manual. See crash(1M)for more details on the operation of the crash utility.

Additionally, crash dumps saved by crash can be useful to send to a customer service representative for analysis of why the system is crashing. If you will be sending crash dump files to a customer service representative, perform the first three tasks listed in "Using Crash Dumps Task Map".

Using Crash Dumps Task Map

Table 69-1 Task Map: Using Crash Dumps
 

Task 

 

Description 

 

For Instructions, Go To 

 

Create a Crash Dump Directory 

 

Create the /var/crash/system-name directory to store crash dump files.

 

"How to Create a Crash Dump Directory"

 
        
 

Reserve Space for Crash Dump Files 

 

Define how much disk space to allow for a crash dump file. 

 

"How to Reserve Space for Crash Dump Files"

 
          
 

Enable Crash Dump Files  

 

Edit the /etc/init.d/sysetup file to activate the saving of crash dump files.

 

"How to Enable Crash Dump Files"

 
          
 

Examine a Crash Dump File  

 

Use the crash command to view crash dump files.

 

"How to Examine a Crash Dump"

 
          
 

Disable Crash Dump Files  

 

Optional.

Edit the /etc/init.d/sysetup file to deactivate the saving of crash dump files.

 

"How to Disable Crash Dump Files"

 
   

Enabling and Disabling Crash Dumps

Enabling a system to save crash dumps involves:

Disabling your system from saving crash dumps involves reversing these procedures.

How to Create a Crash Dump Directory

  1. Become superuser.

  2. Create the /var/crash directory.


    # mkdir /var/crash
    
  3. Change to the /var/crash directory.


    # cd /var/crash
    
  4. Create a directory with the name of the system.


    # mkdir system-name
    

    system-name

    The system for which you want to save crash dump files. 

  5. Verify the directory has been created.


    # ls system-name
    

Example--Creating a Directory to Save Crash Dump Files

The following example shows how to create a directory to save crash dump files for the system saturn.


# mkdir /var/crash
# cd /var/crash
# mkdir saturn
# ls
 saturn

How to Reserve Space for Crash Dump Files

  1. Be sure you have completed any required tasks identified in Table 69-1.

  2. Become superuser.

  3. Change to the /var/crash/system-name directory.


    # cd /var/crash/system-name
    

    system-name

    The system for which you want to save crash dump files. 

  4. Using the editor of your choice, create a file named minfree that contains a number specifying the minimum available free space (in kilobytes) that must remain available for crash dumps.

  5. Exit the file, saving changes.

Example--Reserving Space for Crash Dump Files

The following example shows the contents of a minfree file that reserves 500 Kbytes of available free space to contain crash dump files for the system saturn.


$ more /var/crash/saturn/minfree
500

How to Enable Crash Dump Files

  1. Be sure you have completed any required tasks identified in Table 69-1.

  2. Become superuser.

  3. Using the editor of your choice, edit the /etc/init.d/sysetup file, activating the lines that enable the crash dumps by deleting the comment marks (#) from the beginning of those lines.

  4. Exit the file, saving the changes.

Example--Enabling Crash Dump Files

The following example shows the appropriate section of the /etc/init.d/sysetup file that has been edited to enable crash dumps.


##
## Default is to not do a savecore
##
If [ ! -d /var/crash/`uname -n` ]
then mkdir -m 0700 -p /var/crash/`uname -n`
fi
     echo 'checking for crash dump...\c '
savecore /var/crash/`uname -n`
     echo ''

How to Examine a Crash Dump

  1. Become superuser.

  2. Examine a crash dumps by using the crash utility.


    # /usr/sbin/crash [-d crashdump-file] [-n name-list] [-w output-file]

    -d crashdump-file

    Specifies a file to contain the system memory image. The default crash dump file is /dev/mem.

    -n name-list

    Specifies a text file to contain symbol table information if you want to examine symbolic access to the system memory image. The default file name is /dev/ksyms.

    -w output-file

    Specifies a file to contain output from a crash session. The default is standard output. 

  3. Display crash status information.


    # /usr/sbin/crash
    dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
    > status
       .
       .
       .
    > size buf proc queue
       .
       .
       .

Example--Examining a Crash Dump

The following example shows sample output from the crash utility. Information about status, and about the buffer, process, and queue size is displayed.


# /usr/sbin/crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> status
system name:    SunOS
release:        5.6
node name:      saturn
version:        Generic
machine name:   sun4m
time of crash:  Fri Jan 10 14:14:39 1997
age of system:  60 day, 5 hr., 24 min.
panicstr:
panic registers:
        eip: 0     esp: 0
> size buf proc queue
120
1552
88
 

How to Disable Crash Dump Files

  1. Become superuser.

  2. Edit the /etc/init.d/sysetup file, inserting a comment mark (#) at the beginning of each of the lines shown below.


    #if [ ! -d /var/crash/`uname -n` ]
    #then mkdir -p /var/crash/`uname -n`
    #fi
    #                echo `checking for crash dump...\c `
    #savecore /var/crash/`uname -n`
    #                echo ''
  3. Save the changes.

  4. Remove the file set up for crash dumps from the /var/crash directory.


    # rm -rf /var/crash/system-name
    

    system-name

    Name of the system which will no longer save crash dump files.

Chapter 70 Troubleshooting Miscellaneous Software Problems

This chapter describes miscellaneous software problems that may occur occasionally and are relatively easy to fix. Troubleshooting miscellaneous software problems includes solving problems that aren't related to a specific software application or topic, such as unsuccessful reboots and full file systems. Resolving these problems are described in the following sections.

This is a list of information in this chapter.

What to Do If Rebooting Fails

If the system does not reboot completely, or if it reboots and then crashes again, there may be a software or hardware problem that is preventing the system from booting successfully.

Problem -- A System Won't Boot Because ... 

How to Fix the Problem 

The system can't find /platform/`uname -m`/kernel/ unix.

You may need to change the boot-device setting in the PROM on a SPARC system. See "SPARC: How to Change the Default Boot Device" on page 14.

There is no default boot device on an x86 system. The message displayed is: 

Not a UFS filesystem.

Boot the system using the Configuration Assistant/Boot diskette and select the disk from which to boot. 

There's an invalid entry in the /etc/passwd file.

See Part III for information on recovering from an invalid passwd file.

There's a hardware problem with a disk or another device. 

Check the hardware connections: 

  • Make sure the equipment is plugged in.

  • Make sure all the switches are set properly.

  • Look at all the connectors and cables, including the Ethernet cables.

  • If all this fails, turn off the power to the system, wait 10 to 20 seconds, and then turn on the power again.

If none of the above suggestions solve the problem, contact your local service provider.

What to Do If a System Hangs

A system may freeze or hang rather than crash completely if some software process is stuck. Follow these steps to recover from a hung system.

  1. Determine whether the system is running a window environment and follow the suggestions listed below. If these suggestions don't solve the problem, go to step 2.

    • Make sure the pointer is in the window where you are typing the commands

    • Press Control-q in case the user accidently pressed Control-s, which freezes the screen. Control-s freezes only the window, not the entire screen. If a window is frozen, try using another window.

    • If possible, log in remotely from another system on the network. Use the ps command to look for the hung process. If it looks like the window system is hung, identify the process and kill it.

  2. Press Control-\ to force a "quit" in the running program and (probably) write out a core file.

  3. Press Control-c to interrupt the program that may be running.

  4. Log in remotely and attempt to identify and kill the process that is hanging the system.

  5. Log in remotely, become superuser and reboot the system.

  6. If the system still does not respond, force a crash dump and reboot. See Chapter 69, Generating and Saving System Crash Information for information on forcing a crash dump and booting.

  7. If the system still does not respond, turn the power off, wait a minute or so, then turn the power back on.

  8. If you cannot get the system to respond at all, contact your local service provider for help.

What to Do If a File System Fills Up

When the root (/) file system or any other file system fills up, you will see the following message in the console window:


.... file system full

There are several reasons why a file system fills up. The following sections describe several scenarios for recovering from a full file system. See Chapter 57, Managing Disk Use (Tasks) for information on routinely cleaning out old and unused files to prevent full file systems.

A File System Fills Up Because a Large File or Directory Was Created

Reason Error Occurred 

How to Fix the Problem 

Someone accidentally copied a file or directory to the wrong location. This also happens when an application crashes and writes a large core file into the file system.

Log in as superuser and use the ls -tl command in the specific file system to identify which large file is newly created and remove it. See "How to Find and Delete core Files" to remove core files.

The tmpfs File System Is Full Because the System Ran Out of Memory

Reason Error Occurred 

How to Fix the Problem 

This can occur if tmpfs is trying to write more than it is allowed or some current processes are using a lot of memory.

See tmpfs(7FS) for information on recovering from tmpfs-related error messages.

What to Do If File ACLs Are Lost After Copy or Restore

Reason Error Occurred 

How to Fix the Problem 

If files or directories with ACLs are copied or restored into the /tmp directory, the ACL attributes are lost. The /tmp directory is usually mounted as a temporary file system, which doesn't support UFS file system attributes such as ACLs.

Copy or restore files into the /var/tmp directory instead.

Troubleshooting Backup Problems

This section describes some basic troubleshooting techniques to use when backing up and restoring data.

The root (/) File System Fills Up After You Back Up a File System

You back up a file system, and the root (/) file system fills up. Nothing is written to the media, and the ufsdump command prompts you to insert the second volume of media.

Reason Error Occurred 

How to Fix the Problem 

If you used an invalid destination device name with the -f option, the ufsdump command wrote to a file in the /dev directory of the root (/) file system, filling it up. For example, if you typed /dev/rmt/st0 instead of /dev/rmt/0, the backup file /dev/rmt/st0 was created on the disk rather than being sent to the tape drive.

Use the ls -tl command in the /dev directory to identify which file is newly created and abnormally large, and remove it.

Make Sure the Backup and Restore Commands Match

You can only use ufsrestore to restore files backed up with ufsdump. If you back up with tar, restore with tar. If you use the ufsrestore command to restore a tape that was written with another command, an error message tells you that the tape is not in ufsdump format.

Check to Make Sure You Have the Right Current Directory

It is easy to restore files to the wrong location. Because the ufsdump command always copies files with full path names relative to the root of the file system, you should usually change to the root directory of the file system before running ufsrestore. If you change to a lower-level directory, after you restore the files you will see a complete file tree created under that directory.

Use the Old restore Command to Restore Multivolume Diskette Backups

You cannot use the ufsrestore command to restore files from a multivolume backup set of diskettes made with the dump command. You must restore the files on a SunOS 4.x system.

Interactive Commands

When you use the interactive command, a ufsrestore> prompt is displayed, as shown in this example:


# ufsrestore ivf /dev/rmt/0
Verify volume and initialize maps
Media block size is 126
Dump date: Wed Nov 06 15:21:10 1996
Dumped from: the epoch
Level 0 dump of /usr on venus:/dev/dsk/c0t1d0s6
Label:none
Extract directories from tape
Initialize symbol table.
ufsrestore>

At the ufsrestore> prompt, you can use the commands listed on "Commands for Interactive Restore" to find files, create a list of files to be restored, and restore them.

Chapter 71 Troubleshooting File Access Problems

This is a list of the step-by-step instructions in this chapter.

Users frequently experience problems--and call on a system administrator for help--because they cannot access a program, a file, or a directory that they could previously use. Whenever you encounter such a problem, investigate one of three areas:

This chapter briefly describes how to recognize problems in each of these three areas and suggests possible solutions.

Solving Problems With Search Paths (Command not found)

A message of Command not found indicates one of the following:

To fix a search path problem, you need to know the pathname of the directory where the command is stored.

If the wrong version of the command is found, a directory that has a command of the same name is in the search path. In this case, the proper directory may be later in the search path or may not be present at all.

You can display your current search path by using the echo $PATH command.


$ echo $PATH 
/home/kryten/bin:/sbin:/usr/sbin:/usr/openwin/bin:/usr/openwin/bin/xview:
/usr/dist/local/exe:/usr/dist/exe

Use the which command to determine whether you are running the wrong version of the command.


$ which maker 
/usr/doctools/frame5.1/bin/maker

Note -

The which command looks in the .cshrc file for path information. The which command may give misleading results if you execute it from the Bourne or Korn shell and you have a .cshrc file than contains aliases for the which command. To ensure accurate results, use the which command in a C shell, or, in the Korn shell, use the whence command.


How to Diagnose and Correct Search Path Problems

  1. Display the current search path to verify that the directory for the command is not in your path or that it isn't mispelled.


    $ echo $PATH 
    
  2. Check the following:

    • Is the search path correct?

    • Is the search path listed before other search paths where another version of the command is found?

    • Is the command in one of the search paths?

    If the path needs correction, go to step 3. Otherwise, go to step 4.

  3. Add the path to the appropriate file, as shown in this table.

    Shell 

    File 

    Syntax 

    Notes 

    Bourne and Korn 

    $HOME/.profile

    $ PATH=$HOME/bin:/sbin:/usr/local
    /bin ...
    
    $ export PATH
    

    A colon separates path names. 

    $HOME/.cshrc

    or 

    $HOME/.login

    $ set path=(~bin /sbin /usr/local/
    bin ...)
    

    A blank space separates path names. 

  4. Activate the new path as follows:

    Shell 

    File Where Path Is Located 

    Activate The Path With ... 

    Bourne and Korn 

    .profile

    $ . ./.profile
    

    .cshrc

    hostname% source .cshrc
    

     

    .login

    hostname% source .login

  5. Verify the path using the command shown below.


    $ which command
    

Example--Diagnosing and Correcting Search Path Problems

This example shows that the OpenWindows executable is not in any of the directories in the search path using the which command.


venus% openwin
Command not found
venus% echo $PATH
no openwin in . /home/ignatz /sbin /usr/sbin /usr/bin /etc 
/home/ignatz/bin /bin /home/bin /usr/etc
venus% vi ~.cshrc
(Add appropriate command directory to the search path)
venus% source .cshrc
venus% openwin

If you cannot find a command, look at the man page for its directory path. For example, if you cannot find the lpsched command (the lp printer daemon), lpsched(1M) tells you the path is /usr/lib/lp/lpsched.

Solving File Access Problems

When users cannot access files or directories that they previously could access, the permissions or ownership of the files or directories probably has changed.

Changing File Permissions

Table 71-1 shows the octal values for setting file and directory permissions. You use these numbers in sets of three to set permissions for owner, group, and other (in that order). For example, the value 644 sets read/write permissions for owner, and read-only permissions for group and other.

Table 71-1 Octal Values for File Permissions

Value 

Description 

No permissions 

Execute-only 

Write-only 

Write, execute 

Read-only 

Read, execute 

Read, write 

Read, write, execute 

Changing File and Group Ownerships

Frequently, file and directory ownerships change because someone edited the files as superuser. When you create home directories for new users, be sure to make the user the owner of the dot (.) file in the home directory. When users do not own "." they cannot create files in their own home directory.

Access problems can also arise when the group ownership changes or when a group of which a user is a member is deleted from the /etc/group database.

Use the chown command to change file ownership.


# chown new-owner filename

new-owner

Is the specified user-name or UID of the new file owner. There must be an entry for the specified user-name in the passwd file.

filename

Is the specified file or directory. 

Use the chgrp command to change group ownership.


# chgrp new-owner filename

new-owner

Is the specified group ID or GID of the new group owner. There must be an entry for the specified group-name in the group file.

filename

Is the specified file or directory. 

How to Change File Permissions

  1. List the file permissions.


    # ls -l filename
    

    -l

    Displays the long listing, which includes current permissions for the file. 

    filename

    Is the specified file or directory. 

  2. Change the file permissions.


    # chmod nnn filename
    

    nnn

    Are numbers representing the permissions you are assigning to the file owner, the group owner, and all others, in that order. 

    filename

    Is the specified file or directory. 

    Permissions are changed using the numbers you specify.


    Note -

    You can change permissions on groups of files or on all files in a directory using meta characters such as (*) in place of file names or in combination with them.


  3. Verify that the permissions have been changed by using the ls -l command.


    $ ls -l filename
    

    The long listing shows the current permissions for the file.

Example--Changing File Permissions

This example shows changing the permissions of a public directory from 744 (read/write/execute, read-only, and read-only) to 755 (read/write/execute, read/execute, and read/execute).


$ ls -ld public_dir
drwxr--r--  1 ignatz   staff    6023 Aug  5 12:06 public_dir
$ chmod 755 public_dir
$ ls -ld public_dir
drwxr-xr-x  1 ignatz   staff    6023 Aug  5 12:06 public_dir

This example show changing the permissions of an executable shell script from read/write to read/write/execute.


$ ls -l my_script
-rw------- 1 ignatz   staff    6023 Aug  5 12:06 my_script
$ chmod 700 my_script
$ ls -l my_script
-rwx------ 1 ignatz   staff    6023 Aug  5 12:06 my_script

How to Change File Ownership


Note -

You must own a file or directory (or have root permission) to be able to change its owner.


  1. Become superuser.

  2. List the file permissions.


    # ls -l filename
    

    -l

    Displays the long listing, which includes the owner of the file, displayed in the third column. 

    filename

    Is the specified file or directory. 

  3. Change the file owner.


    # chown new-owner filename
    

    Ownership is assigned to the new owner you specify.

  4. Verify the file ownership change.


    # ls -l filename
    

Example--Changing File Ownership


# ls -l quest
-rw-r--r--  1 fred   staff    6023 Aug  5 12:06 quest
# chown ignatz quest
# ls -l quest
-rw-r--r--  1 ignatz   staff    6023 Aug  5 12:06 quest

How to Change Group Ownership

  1. List the file permissions.


    # ls -l filename
    
  2. Change the group that owns the file or directory.


    $ chgrp GID filename
    

    The group ID for the file or directory you specify is changed.

  3. Verify the file ownership change.


    # ls -l filename
    

Example--Changing Group Ownership


$ ls -l junk
-rw-r--r-- 1 kryten other 3138 Oct 31 14:49 junk
$ chgrp staff junk
$ ls -l junk
-rw-r--r-- 1 kryten staff 3138 Oct 31 14:49 junk

See Chapter 51, Securing Files (Tasks) for information about how to edit group accounts.

Recognizing Problems With Network Access

If users have problems using the rcp remote copy command to copy files over the network, the directories and files on the remote system may have restricted access by setting permissions. Another possible source of trouble is that the remote system and the local system are not configured to allow access.

See NFS Administration Guide for information about problems with network access and problems with accessing systems through AutoFS.

Chapter 72 Troubleshooting Printing Problems

This chapter explains how to troubleshoot printing problems that may occur when you set up or maintain printing services.

This is a list of step-by-step instructions in this chapter.

See for information about printing and the LP print service.

Tips on Troubleshooting

Sometimes after setting up a printer, you find that nothing prints. Or, you may get a little farther in the process: something prints, but it is not what you expect--the output is incorrect or illegible. Then, when you get past these problems, other problems may occur, such as:


Note -

Although many of the suggestions in this chapter are relevant to parallel printers, they are geared toward the more common serial printers.


Troubleshooting Adding a Printer

If you use Admintool to add access to a remote printer after installing the Solaris 2.6 release, and you get the following message:


Admintool: Error
add remote printer failed

It is possible that the SunSoft Print Client software is installed in your network and the remote printer is already available to you. Use the lpstat -t command before adding a printer to see if the printer is available.

Troubleshooting No Output (Nothing Prints)

When nothing prints, there are three general areas to check:

If you get a banner page, but nothing else, this is a special case of incorrect output. See "Troubleshooting Incorrect Output".

Check the Hardware

The hardware is the first area to check. As obvious as it sounds, you should make sure that the printer is plugged in and turned on. In addition, you should refer to the manufacturer's documentation for information about hardware settings. Some computers use hardware switches that change the characteristics of a printer port.

The printer hardware includes the printer, the cable that connects it to the computer, and the ports into which the cable plugs at each end. As a general approach, you should work your way from the printer to the computer. Check the printer. Check where the cable connects to the printer. Check the cable. Check where the cable connects to the computer.

Check the Network

Problems are more common with remote print requests--those going from a print client to a print server. You should make sure that network access between the print server and print clients is enabled.

If the network is running the Network Information Service Plus (NIS+), see the NIS+ and FNS Administration Guide in the Solaris 2.6 System Administrator AnswerBook for instructions to enable access between systems. If the network is not running the Network Information Service (NIS) or NIS+, before you set up print servers and print clients, include the Internet address and system name for each client system in the /etc/hosts file on the print server. Also, the Internet address and system name for the print server must be included in the /etc/hosts file of each print client system.

Check the LP Print Service

For printing to work, the LP scheduler must be running on both the print server and print client. If it is not running, you need to start it using the /usr/lib/lp/lpsched command. If you have trouble starting the scheduler, see "How to Restart the Print Scheduler".

In addition to the scheduler running, a printer must be enabled and accepting requests before it will produce any output. If the LP print service is not accepting requests for a printer, the submitted print requests are rejected. Usually, in that instance, the user receives a warning message after submitting a print request. If the LP print service is not enabled for a printer, print requests remain queued on the system until the printer is enabled.

In general, you should analyze a printing problem as follows:

The procedures found in "Troubleshooting Printing Problems" use this strategy to help you troubleshoot various problems with the LP print service.

If basic troubleshooting of the LP print service does not solve the problem, you need to follow the troubleshooting steps for the specific client/server case that applies:

Troubleshooting Incorrect Output

If the printer and the print service software are not configured correctly, the printer may print, but it may provide output that is not what you expect.

Check the Printer Type and File Content Type

If you used the wrong printer type when you set up the printer with the LP print service, inappropriate printer control characters can be sent to the printer. The results are unpredictable: nothing may print, the output may be illegible, or the output may be printed in the wrong character set or font.

If you specified an incorrect file content type on a SunOS 5.x print client or a SunOS 5.x print server, the banner page may print, but that is all. The file content types specified for a printer indicate the types of files the printer can print directly, without filtering. When a user sends a file to the printer, the file is sent directly to the printer without any attempt to filter it. The problem occurs if the printer cannot handle the file content type.

When setting up print clients, you increase the chance for a mistake because the file content types must be correct on both the print server and the print client. If you set up the print client as recommended with any as the file content type, files are sent directly to the print server and the print server determines the need for filtering. Therefore, the file content types have to be specified correctly only on the server.

You can specify a file content on the print client to off-load filtering from the server to the client, but the content type must be supported on the print server.

Check the stty Settings

Many formatting problems can result when the default stty (standard terminal) settings do not match the settings required by the printer. The following sections describe what happens when some of the settings are incorrect.

Wrong Baud Settings

When the baud setting of the computer does not match the baud setting of the printer, usually you get some output, but it does not look like the file you submitted for printing. Random characters are displayed, with an unusual mixture of special characters and undesirable spacing. The default for the LP print service is 9600 baud.


Note -

If a printer is connected by a parallel port, the baud setting is irrelevant.


Wrong Parity Setting

Some printers use a parity bit to ensure that data received for printing has not been garbled during transmission. The parity bit setting for the computer and the printer must match. If they do not match, some characters either will not be printed at all, or will be replaced by other characters. In this case, the output looks approximately correct; the word spacing is all right and many letters are in their correct place. The LP print service does not set the parity bit by default.

Wrong Tab Settings

If the file contains tabs, but the printer expects no tabs, the printed output may contain the complete contents of the file, but the text may be jammed against the right margin. Also, if the tab settings for the printer are incorrect, the text may not have a left margin, it may run together, it may be concentrated to a portion of the page, or it may be incorrectly double-spaced. The default is for tabs to be set every eight spaces.

Wrong Return Setting

If the output is double-spaced, but it should be single-spaced, either the tab settings for the printer are incorrect or the printer is adding a line feed after each return. The LP print service adds a return before each line feed, so the combination causes two line feeds.

If the print zigzags down the page, the stty option onlcr that sends a return before every line feed is not set. The stty=onlcr option is set by default, but you may have cleared it while trying to solve other printing problems.

Troubleshooting Hung LP Print Service Commands

If you type any of the LP commands (such as lpsystem, lpadmin, or lpstat) and nothing happens (no error message, status information, or prompt is displayed), chances are something is wrong with the LP scheduler. Such a problem can usually be resolved by stopping and restarting the LP scheduler. See "How to Stop the Print Scheduler" and "How to Restart the Print Scheduler" for instructions.

Troubleshooting Idle (Hung) Printers

You may find a printer that is idle, even though it has print requests queued to it. A printer may seem idle when it should not be for one of the following reasons:

Check the Print Filters

Slow print filters run in the background to avoid tying up the printer. A print request that requires filtering will not print until it has been filtered.

Check Printer Faults

When the LP print service detects a fault, printing resumes automatically, but not immediately. The LP print service waits about five minutes before trying again, and continues trying until a request is printed successfully. You can force a retry immediately by enabling the printer.

Check Network Problems

When printing files over a network, you may encounter the following types of problems:

Print Requests Backed Up in the Local Queue

Print requests submitted to a print server may back up in the client system queue for the following reasons:

While you are tracking the source of the problem, you should stop new requests from being added to the queue. See "How to Accept or Reject Print Requests for a Printer" for more information.

Print Requests Backed Up in the Remote Queue

If print requests back up in the print server queue, the printer has probably been disabled. When a printer is accepting requests, but not processing them, the requests are queued to print. Unless there is a further problem, once the printer is enabled, the print requests in the queue should print.

Troubleshooting Conflicting Status Messages

A user may enter a print request and be notified that the client system has accepted it, then receive mail from the print server that the print request has been rejected. These conflicting messages may occur for the following reasons:

You should check that identical definitions of these job components are registered on both the print clients and print servers so that local users can access printers on the print servers.

Troubleshooting Printing Problems

This section contains step-by-step instructions that explain:

How to Troubleshoot No Printer Output

This task includes the following troubleshooting procedures to try when you submit a print request to a printer and nothing prints:

Try the first three procedures in the order in which they are listed, before going to the specific print client/server case that applies. However, if the banner page prints, but nothing else does, turn to the instructions under "How to Troubleshoot Incorrect Output".

To check the hardware:

  1. Check that the printer is plugged in and turned on.

  2. Check that the cable is connected to the port on the printer and to the port on the system or server.

  3. Make sure that the cable is the correct cable and that it is not defective.

    Refer to the manufacturer`s documentation. If the printer is connected to a serial port, verify that the cable supports hardware flow control; a NULL modem adapter supports this. Table 72-1 shows the pin configuration for NULL modem cables.

    Table 72-1 Pin Configuration for NULL Modem Cables
     

    Host 

    Printer 

    Mini-Din-8 

    25-Pin D-sub 

    25-Pin D-sub 

    -  

    1 (FG) 

    1(FG) 

    3(TD) 

    2(TD) 

    3(RD) 

    5(RD) 

    3(RD) 

    2(TD) 

    6(RTS) 

    4(RTS) 

    5(CTS) 

    2(CTS) 

    5(CTS) 

    4(RTS) 

    4(SG) 

    7(SG) 

    7(SG) 

    7(DCD) 

    6(DSR), 8(DCD) 

    20(DTR) 

    1(DTR) 

    20(DTR) 

    6(DSR), 8(DCD) 

  4. Check that any hardware switches for the ports are set properly.

    See the printer documentation for the correct settings.

  5. Check that the printer is operational.

    Use the printer's self-test feature, if the printer has one. Check the printer documentation for information about printer self-testing.

  6. Check that the baud settings for the computer and the printer are correct.

    If the baud settings are not the same for both the computer and the printer, sometimes nothing will print, but more often you get incorrect output. For instructions, see "How to Troubleshoot Incorrect Output".

To check the network:

  1. Check that the network link between the print server and the print client is setup correctly.


    print_client# ping print_server 
    print_server is alive
    print_server# ping  print_client 
    print_client not available

    If the message says the system is alive, you know you can reach the system, so the network is all right. The message also tells you that either a name service or the local /etc/hosts file has translated the host (system) name you entered into an IP address; otherwise, you would need to enter the IP address.

    If you get a not available message, try to answer the following questions: How is NIS or NIS+ set up at your site? Do you need to take additional steps so that print servers and print clients can communicate with one another? If your site is not running NIS or NIS+, have you entered the IP address for the print server in each print client's /etc/hosts file, and entered all print client IP addresses in the /etc/hosts file of the print server?

  2. (On a SunOS 5.0-5.1 print server only) Check that the listen port monitor is configured correctly.

  3. (On a SunOS 5.0-5.1 print server only) Check that the network listen services are registered with the port monitor on the print server.

To check the basic functions of the LP print service:

This procedure uses the printer luna as an example of checking basic LP print service functions.

  1. On both the print server and print client, make sure that the LP print service is running.

    1. Check whether the LP scheduler is running.


      # lpstat -r
      scheduler is running
    2. If the scheduler is not running, become superuser or lp, and start the scheduler.


      # /usr/lib/lp/lpsched
      

      If you have trouble starting the scheduler, see "How to Unhang the LP Print Service".

  2. On both the print server and print client, make sure that the printer is accepting requests.

    1. Check that the printer is accepting requests.


      # lpstat -a
      mars accepting requests since Jun 04 16:13 1997
      luna not accepting requests since Jun 04 08:10 1997
      unknown reason

      This command verifies that the LP system is accepting requests for each printer configured for the system.

    2. If the printer is not accepting requests, become superuser or lp, and allow the printer to accept print requests.


      # accept luna
      

      The specified printer now accepts requests.

  3. On both the print server and print client, make sure that the printer is enabled to print submitted print requests.

    1. Check that the printer is enabled.


      # lpstat -p luna
      printer luna disabled since Jun 04 09:40 1997.
      available.
      unknown reason

      This command displays information about printer status. You can omit the printer name to obtain information about all printers set up for the system. The following example shows a printer that is disabled.

    2. If the printer is disabled, become superuser or lp, and enable the printer.


      # enable luna
      printer "luna" now enabled.

      The specified printer is enabled to process print requests.

  4. On the print server, make sure that the printer is connected to the correct serial port.

    1. Check that the printer is connected to the correct serial port.


      # lpstat -t
      scheduler is running
      system default destination: luna
      device for luna: /dev/term/a

      The message device for printer-name shows the port address. Is the cable connected to the port to which the LP print service says is connected? If the port is correct, skip to Step 5.

    2. Become superuser or lp.

    3. Change the file ownership of the device file that represents the port.


      # chown lp device-filename
      

      This command assigns the special user lp as the owner of the device file. In this command, device-filename is the name of the device file.

    4. Change the permissions on the printer port device file.


      # chmod 600 device-filename
      

      This command allows only superuser or lp to access the printer port device file.

  5. On both the print server and print client, make sure that the printer is configured properly.

    1. Check that the printer is configured properly.


      # lpstat -p luna -l
      printer luna is idle. enabled since May 20 17:39 1997. available.
              Content types: postscript
              Printer types: PS

      The above example shows a PostScript printer that is configured properly, and that is available to process print requests. If the printer type and file content type are correct, skip to Step 6.

    2. If the printer type or file content type is incorrect, try setting the print type to unknown and the content type to any on the print client.


      # lpadmin -p printer-name -T printer-type -I file-content-type
      
  6. On the print server, make sure that the printer is not faulted.

    1. Check that the printer is not waiting because of a printer fault.


      # lpadmin -p printer-name -F continue 
      

      This command instructs the LP print service to continue if it is waiting because of a fault.

    2. Force an immediate retry by re-enabling the printer.


      # enable printer-name 
      
    3. (Optional) Instruct the LP print service to enable quick notification of printer faults.


      # lpadmin -p printer-name -A 'write root'
      

      This command instructs the LP print service to set a default policy of writing root--sending the printer fault message to the terminal on which root is logged in--if the printer fails. This may help you get quick notification of faults as you try to fix the problem.

  7. Make sure that the printer is not set up incorrectly as a login terminal.


    Note -

    It is easy to mistakenly set up a printer as a login terminal, so be sure to check this possibility even if you think it does not apply.


    1. Look for the printer port entry in the ps -ef command output.


      # ps -ef
          root   169   167  0   Apr 04 ?        0:08 /usr/lib/saf/listen tcp
          root   939     1  0 19:30:47 ?        0:02 /usr/lib/lpsched
          root   859   858  0 19:18:54 term/a   0:01 /bin/sh -c \ /etc/lp
      /interfaces/luna
      luna-294 rocket!smith "passwd\n##
      #

      In the output from this command, look for the printer port entry. In the above example, port /dev/term/a is set up incorrectly as a login terminal. You can tell by the "passwd\n## information at the end of the line. If the port is set correctly, skip the last steps in this procedure.

    2. Cancel the print request(s).


      # cancel request-id
      

      In this command, request-id is the request ID number for a print request to be canceled.

    3. Set the printer port to be a nonlogin device.


      # lpadmin -p printer-name -h
      
    4. Check the ps -ef command output to verify that the printer port is no longer a login device.

      If you do not find the source of the printing problem in the basic LP print service functions, continue to one of the following procedures for the specific client/server case that applies.

To check printing from a SunOS 5.x client to a SunOS 5.x print server:

  1. Check the basic functions of the LP print service on the print server, if you have not done so already.

    For instructions on checking basic functions, see "To check the basic functions of the LP print service: ". Make sure that the printer works locally before trying to figure out why nothing prints when a request is made from a print client.

  2. Check the basic functions of the LP print service on the print client, if you have not done so already.

    For instructions on checking basic functions, see "To check the basic functions of the LP print service: ". On the print client, the LP scheduler has to be running, and the printer has to be enabled and accepting requests before any request from the client will print.


    Note -

    For most of the following steps, you must be logged in as root or lp.


  3. Make sure that the print server is accessible.

    1. On the print client, send an "are you there?" request to the print server.


      print_client# ping print_server
      

      If you receive the message print_server not available, you may have a network problem.

  4. On SunOS 5.1 print client only, make sure that the print server is identified as type s5 by viewing the Modify Printer window in Admintool.

  5. Verify that the print server is operating properly.


    # lpstat -t luna
    scheduler is running
    system default destination: luna
    device for luna: /dev/term/a
    luna accepting requests since May 20 17:41 1997. available.
    printer luna now printing luna-314. enabled since May 20 17:41 1997. 
    available.
    luna-129            root               488   May 20 17:45
    #

    The above example shows a print server up and running.

  6. If the print server is not operating properly, go back to "".

To check printing from a SunOS 5.x client to a SunOS 4.1 print server:

  1. Check the basic functions of the LP print service on the print client, if you have not done so already.

    For instructions, see "To check the basic functions of the LP print service: ".

  2. Make sure that the print server is accessible.

    1. On the print client, send an "are you there?" request to the print server.


      print_client# ping print_server
      

      If you receive the message print_server not available, you may have a network problem.

  3. Make sure that the lpd daemon on the print server is running.

    1. On the print server, verify the lpd daemon is running.


      $ ps -ax | grep lpd
        126 ?  IW    0:00 /usr/lib/lpd
        200 p1 S     0:00 grep lpd
      $

      If the lpd daemon is running, a line is displayed, as shown in the above example. If it is not running, no process information is shown.

    2. If lpd is not running on the print server, become superuser on the print server, and restart it.


      # /usr/lib/lpd &
       
      
  4. Make sure that the remote lpd daemon is configured properly.

    1. On the print server, become superuser, and invoke the lpc command.


      # /usr/etc/lpc
      lpc>
    2. Get LP status information.


      lpc> status
      luna:
      queuing is enabled
      printing is enabled
      no entries
      no daemon present
      lpc>

      Status information is displayed. In the above example, the daemon is not running and needs to be restarted.

    3. If no daemon is present, restart the daemon.


      lpc> restart luna
      

      The daemon is restarted.

    4. Verify that the lpd daemon has started.


      lpc> status
      
    5. Quit the lpc command.


      lpc> quit
      

      The shell prompt is redisplayed.

  5. Make sure that the print client has access to the print server.

    1. Check if there is an /etc/hosts.lpd file on the 4.1 print server.

      On a 4.1 print server, if this file exists, it is used to determine whether an incoming print request can be accepted. If the file does not exist, all print client systems have access, so skip steps b and c.

    2. If the file exists, see if the print client is listed in the file.

      Requests from client systems not listed in the file are not transferred to the print server.

    3. If the client is not listed, add the print client to the file.


      Note -

      If you get this far without pinpointing the problem, the SunOS 4.1 system is probably set up and working properly.


  6. Make sure that the connection to the remote lpd print daemon from the print client is made correctly.

    1. On the print client, become superuser, and verify the lpsched daemon is running.


      # ps -ef | grep lp
         root   154     1 80   Jan 07 ?        0:02 /usr/lib/lpsched

      The lpsched daemon should be running, as shown in the above example.

    2. Stop the LP print service.


      # lpshut
      

      The LP print service is stopped.

    3. Restart the LP print service.


      # /usr/lib/lp/lpsched
      

      The LP print service is restarted.

  7. Make sure that the remote print server is identified correctly as a SunOS 4.1 system.

To check printing from a SunOS 4.1 client to a SunOS 5.x print server:

  1. Check the basic functions of the LP print service on the print server, if you have not done so already.

    For instructions, see "To check the basic functions of the LP print service: ". Make sure that the printer works locally before trying to figure out why nothing prints when a request is made from a print client.


    Note -

    You should be logged in as superuser or lp on the system specified in the following steps.


  2. Make sure that the print client is accessible.

    1. On the SunOS 5.x print server, send an "are you there?" request to the print client.


      print_server# ping print_client
      print_client is alive

      If you receive the message print_client not available, you may have a network problem.

  3. On the print client, verify the printer is set up correctly.


    # lpr -P luna /etc/fstab
    lpr: cannot access luna
    #

    This command shows whether the print client is working. The above example shows that the print client is not working correctly.

  4. Make sure that the lpd daemon is running on the print client.

    1. Verify the lpd daemon is running.


      # ps -ax | grep lpd
        118 ?  IW    0:02 /usr/lib/lpd
      #

      This command shows if the lpd daemon is running on the print client. The above example shows that the daemon is running.

    2. On the print client, start the lpd daemon.


      # /usr/lib/lpd &
       
      
  5. On the print client, make sure that there is a printcap entry identifying the printer.

    1. Verify the printer is known.


      # lpr -P mercury /etc/fstab
      lpr: mercury: unknown printer
      #

      The above example shows that the /etc/printcap file does not have an entry for the specified printer.

    2. If there is no entry, edit the /etc/printcap file and add the following information:


      printer-name|print-server:\
      :lp=:rm=print-server:rp=printer-name:br#9600:rw:\ 
      :lf=/var/spool/lpd/printer-name/log:\
      :sd=/var/spool/lpd/printer-name:

      The following example shows an entry for printer luna connected to print server neptune.


      luna|neptune:\
              :lp=:rm=neptune:rp=luna:br#9600:rw:\
              :lf=/var/spool/lpd/luna/log:\
              :sd=/var/spool/lpd/luna:
    3. Create a spooling directory (/var/spool/lpd/printer-name) for the printer.

  6. Make sure that the print client lpd is not in a wait state by forcing a retry.

    If the print server is up and responding, the print client lpd may be in a wait state before attempting a retry.

    1. As superuser on the print client, invoke the lpc command.

      The lpc> prompt is displayed.

    2. Restart the printer.

    3. Quit the lpc command.

      The shell prompt is redisplayed.


      # lpc
      lpc> restart luna
      luna:
             no daemon to abort
      luna:
            daemon started
      # quit
      $
  7. Check the connection to the print server.

    1. On the print client, become superuser, and check the printer log file.


      # more /var/spool/lpd/luna/log
       
      

      Frequently, no information is displayed.

    2. Also check the printer status log.


      # more /var/spool/lpd/luna/status
      waiting for luna to come up
      #
    3. If the connection is all right, on the print server, verify the print server is setup correctly.


      # lpstat -t
      scheduler is running
      system default destination: luna
      device for luna: /dev/term/a
      luna accepting requests since May 20 17:45 1997
      printer luna now printing luna-314. enabled since 
      May 20 17:45 1997. available.
      luna-129            root               488   May 20 17:47
      #

      The above example shows a print server that is up and running.

      If the print server is not running, go back to Step 1 before continuing.

How to Troubleshoot Incorrect Output

  1. Log in as superuser or lp.

  2. Make sure that the printer type is correct.

    An incorrect printer type may cause incorrect output. For example, if you specify printer type PS and the pages print in reverse order, try printer type PSR. (These type names must be in uppercase.) Also, an incorrect printer type may cause missing text, illegible text, or text with the wrong font. To determine the printer type, examine the entries in the terminfo database. For information on the structure of the terminfo database, see "Printer Type".

    1. On the print server, display the printer's characteristics.


      $ lpstat -p luna -l
      printer luna is idle. enabled since Tue Apr 29 11:55:52 MDT 1997. 
      available.
      	Form mounted: 
      	Content types: any
      	Printer types: NeWSprinter20
      	Description: 
      	Connection: direct
      	Interface: /etc/lp/interfaces/alamosa
      	After fault: continue
      	Users allowed:
      		(all)
      	Forms allowed:
      		(none)
      	Banner not required
      	Character sets:
      		
      	Default pitch:
      	Default page size: 80 wide 66 long
      	Default port settings:  
      $
    2. Consult the printer manufacturer's documentation to determine the printer model.

    3. If the printer type is not correct, change it with Admintool's Modify Printer option, or use the following lpadmin command.


      # lpstat -p printer-name -T printer-type
      

      On the print client, the printer type should be unknown. On the print server, the printer type must match a terminfo entry that is defined to support the model of printer you have. If there is no terminfo entry for the type of printer you have, see "How to Add a terminfo Entry for an Unsupported Printer".

  3. If the banner page prints, but there is no output for the body of the document, check the file content types.

    File content types specified for a printer indicate the types of files the printer can print directly without filtering. An incorrect file content type causes filtering to be bypassed when it may be needed.

    1. Note the information on file content type that was supplied in the previous step by the lpstat command.

      On the print client, the file content type should be any, unless you have good reason to specify one or more explicit content types. If a content is specified on the client, filtering is done on the print client, rather than the print server. In addition, content types on the client must match the content types specified on the print server, which in turn must reflect the capabilities of the printer.

    2. Consult your printer manufacturer's documentation to determine which types of files the printer can print directly.

      The names you use to refer to these types of files do not have to match the names used by the manufacturer. However, the names you use must agree with the names used by the filters known to the LP print service.

    3. If the file content type is not correct, change it with Admintool's Modify Printer option, or the following lpadmin command.


      # lpadmin -p printer-name -I file-content-type(s)
      

      Run this command on either the print client, or print server, or both, as needed. Try -I any on the print client, and -I "" on the print server. The latter specifies a null file content type list, which means an attempt should be made to filter all files, because the printer can directly print only files that exactly match its printer type.

      This combination is a good first choice when files are not printing. If it works, you may want to try specifying explicit content types on the print server to reduce unnecessary filtering. For a local PostScript printer, you should use postscript, or postscript,simple-- if the printer supports these types. Be aware that PS and PSR are not file content types; they are printer types.

      If you omit -I, the file content list defaults to simple. If you use the -I option and want to specify file content types in addition to simple, simple must be included in the list.

      When specifying multiple file content types, separate the names with commas. Or you can separate names with spaces and enclose the list in quotation marks. If you specify any as the file content type, no filtering will be done and only file types that can be printed directly by the printer should be sent to it.

  4. Check that the print request does not bypass filtering needed to download fonts.

    If a user submits a print request to a PostScript printer with the command lp -T PS, no filtering is done. Try submitting the request with the command lp -T postscript to force filtering, which may result in the downloading of non-resident fonts needed by the document.

  5. Make sure that the stty settings for the printer port are correct.

    1. Read the printer documentation to determine the correct stty settings for the printer port.


      Note -

      If a printer is connected by a parallel port, the baud setting is irrelevant.


    2. Examine the current settings by using the stty command.


      # stty -a < /dev/term/a
      speed 9600 baud;
      rows = 0; columns = 0; ypixels = 0; xpixels = 0;
      eucw 1:0:0:0, scrw 1:0:0:0
      intr = ^c; quit = ^|; erase = ^?; kill = ^u;
      eof = ^d; eol = <undef>; eol2 = <undef>; swtch = <undef>;
      start = ^q; stop = ^s; susp = ^z; dsusp = ^y;
      rprnt = ^r; flush = ^o; werase = ^w; lnext = ^v;
      parenb -parodd cs7 -cstopb -hupcl cread -clocal -loblk -parext
      -ignbrk brkint -ignpar -parmrk -inpck istrip -inlcr -igncr icrnl -iuclc
      ixon -ixany -ixoff imaxbel
      isig icanon -xcase echo echoe echok -echonl -noflsh
      -tostop echoctl -echoprt echoke -defecho -flusho -pendin iexten
      opost -olcuc onlcr -ocrnl -onocr -onlret -ofill -ofdel tab3
      #

      This command shows the current stty settings for the printer port.

      Table 72-2 shows the default stty options used by the LP print service's standard printer interface program.

      Table 72-2 Default stty Settings Used by the Standard Interface Program

      Option 

      Meaning 

      -9600

      Set baud rate to 9600 

      -cs8

      Set 8-bit bytes 

      -cstopb

      Send one stop bit per byte 

      -parity

      Do not generate parity 

      -ixon

      Enable XON/XOFF (also known as START/STOP or DC1/DC3) 

      -opost

      Do "output post-processing" using all the settings that follow in this table 

      -olcuc

      Do not map lowercase to uppercase 

      -onlcr

      Change line feed to carriage return/line feed 

      -ocrnl

      Do not change carriage returns into line feeds 

      -onocr

      Output carriage returns even at column 0 

      -n10

      No delay after line feeds 

      -cr0

      No delay after carriage returns 

      -tab0

      No delay after tabs 

      -bs0

      No delay after backspaces 

      -vt0

      No delay after vertical tabs 

      -ff0

      No delay after form feeds 

    3. Change the stty settings.


      # lpadmin -p printer-name -o "stty= options" 
      

      Use Table 72-3 to choose stty options to correct various problems affecting print output.

      Table 72-3 stty Options to Correct Print Output Problems

      stty Values

      Result 

      Possible Problem From Incorrect Setting 

      110, 300, 600, 1200, 1800, 2400, 4800, 9600, 19200, 38400

      Sets baud rate to the specified value (enter only one baud rate) 

      Random characters and special characters may be printed and spacing may be inconsistent 

      oddp

      evenp

      -parity

      Sets odd parity 

      Sets even parity 

      Sets no parity 

      Missing or incorrect characters appear randomly 

      -tabs

      Sets no tabs 

      Text is jammed against right margin 

      tabs

      Sets tabs every eight spaces 

      Text has no left margin, is run together, or is jammed together 

      -onlcr

      Sets no carriage return at the beginning of line(s) 

      Incorrect double spacing 

      onlcr

      Sets carriage return at beginning of line(s) 

      The print zigzags down the page 

      You can change more than one option setting by enclosing the list of options in single quotation marks and separating each option with spaces. For example, suppose the printer requires you to enable odd parity and set a 7-bit character size. You would type a command similar to that shown in the following example:


      # lpadmin -p neptune -o "stty='parenb parodd cs7'"
      

      The stty option parenb enables parity checking/generation, parodd sets odd parity generation, and cs7 sets the character size to 7 bits.

  6. Verify that the document prints correctly.


    # lp -d printer-name filename
    

How to Unhang the LP Print Service

  1. Log in as superuser or lp.

  2. Stop the LP print service.


    # lpshut
    

    If this command hangs, press Control-c and proceed to the next step. If this command succeeds, skip to step 4.

  3. Identify the LP process IDs.


    # ps -el | grep lp
       134 term/a   0:01 lpsched
    #

    Use the process ID numbers (PIDs) from the first column in place of the pid variables in the next step.

  4. Stop the LP processes by using the kill -15 command.


    # kill -15 103 134
    

    This should stop the LP print service processes. If the processes do not stop, as a last resort go to step 5.

  5. As a last resort, terminate the processes abruptly.


    # kill -9 103 134
    

    All the lp processes are terminated.

  6. Remove the SCHEDLOCK file so you can restart the LP print service.


    # rm /usr/spool/lp/SCHEDLOCK
    
  7. Restart the LP print service.


    # /usr/lib/lp/lpsched
    

    The LP print service should restart. If you are having trouble restarting the scheduler, see "How to Restart the Print Scheduler".

How to Troubleshoot an Idle (Hung) Printer

This task includes a number of procedures to use when a printer appears idle but it should not be. It makes sense to try the procedures in order, but the order is not mandatory.

To check that the printer is ready to print:

  1. Display printer status information.


    # lpstat -p printer-name 
    

    The information displayed shows you whether the printer is idle or active, enabled or disabled, or available or not accepting print requests. If everything looks all right, continue with other procedures in this section. If you cannot run the lpstat command, see "How to Unhang the LP Print Service".

  2. If the printer is not available (not accepting requests), allow the printer to accept requests.


    # accept printer-name 
    

    The printer begins to accept requests into its print queue.

  3. If the printer is disabled, re-enable it.


    # enable printer-name 
    

    This command re-enables the printer so that it will act on the requests in its queue.

To check for print filtering:

Check for print filtering by using the lpstat -o command.


$ lpstat -o luna
luna-10           fred         1261   Mar 12 17:34 being filtered
luna-11           iggy         1261   Mar 12 17:36 on terra
luna-12           jack         1261   Mar 12 17:39 on terra
$

See if the first waiting request is being filtered. If the output looks like the above example, the file is being filtered; the printer is not hung, it just is taking a while to process the request.

To resume printing after a printer fault:

  1. Look for a message about a printer fault and try to correct the fault if there is one.

    Depending on how printer fault alerts have been specified, messages may be sent to root by email or written to a terminal on which root is logged in.

  2. Re-enable the printer.


    # enable printer-name 
    

    If a request was blocked by a printer fault, this command will force a retry. If this command does not work, continue with other procedures in this section.

To send print requests to a remote printer when they back up in the local queue:

  1. On the print client, stop further queuing of print requests to the print server.


    # reject printer-name 
    
  2. On the print client, send an "are you there?" request to the print server.


    print_client# ping print_server
    print_server is alive

    If you receive the message print_server not available, you may have a network problem.

  3. After you fix the above problem, allow new print requests to be queued.


    # accept printer-name 
    
  4. If necessary, re-enable the printer.


    # enable printer-name 
    

To free print requests from a print client that back up in the print server queue:

  1. On the print server, stop further queuing of print requests from any print client to the print server.


    # reject printer-name 
    
  2. Display the lpsched log file.


    # more /var/lp/logs/lpsched
    

    The information displayed may help you pinpoint what is preventing the print requests from the print client to the print server from being printed.

  3. After you fix the problem, allow new print requests to be queued.


    # accept printer-name
    
  4. If necessary, re-enable the printer on the print server.


    # enable printer-name
    

How to Resolve Conflicting Printer Status Messages

  1. On the print server, verify the printer is enabled and is accepting requests.


    # lpstat -p printer-name
    

    Users will see conflicting status messages when the print client is accepting requests, but the print server is rejecting requests.

  2. On the print server, check that the definition of the printer on the print client matches the definition of the printer on the print server.


    # lpstat -p -l printer-name
    

    Look at the definitions of the print job components, like print filters, character sets, print wheels, and forms, to be sure they are the same on both the client and server systems so that local users can access printers on print server systems.

Chapter 73 Troubleshooting File System Problems

This is a list of the information in this chapter.

See Chapter 31, Checking File System Integrity for information about the fsck program and how to use it to check file system integrity.

Error Messages

Normally, fsck is run non-interactively to preen the file systems after an abrupt system halt in which the latest file system changes were not written to disk. Preening automatically fixes any basic file system inconsistencies and does not try to repair more serious errors. While preening a file system, fsck fixes the inconsistencies it expects from such an abrupt halt. For more serious conditions, the command reports the error and terminates.

When you run fsck interactively, fsck reports each inconsistency found and fixes innocuous errors. However, for more serious errors, the command reports the inconsistency and prompts you to choose a response. When you run fsck using the -y or -n options, your response is predefined as yes or no to the default response suggested by fsck for each error condition.

Some corrective actions will result in some loss of data. The amount and severity of data loss may be determined from the fsck diagnostic output.

fsck is a multipass file system check program. Each pass invokes a different phase of the fsck program with different sets of messages. After initialization, fsck performs successive passes over each file system, checking blocks and sizes, path names, connectivity, reference counts, and the map of free blocks (possibly rebuilding it). It also performs some cleanup.

The phases (passes) performed by the UFS version of fsck are:

The next sections describe the error conditions that may be detected in each phase, the messages and prompts that result, and possible responses you can make.

Messages that may appear in more than one phase are described in "General fsck Error Messages ". Otherwise, messages are organized alphabetically by the phases in which they occur.

Many of the messages include the abbreviations shown in Table 73-1:

Table 73-1 Error Message Abbreviations

Abbreviation 

Meaning 

BLK

Block number 

DUP

Duplicate block number 

DIR

Directory name 

CG

Cylinder group 

MTIME

Time file was last modified 

UNREF

Unreferenced 

Many of the messages also include variable fields, such as inode numbers, which are represented in this book by an italicized term, such as inode-number. For example, this screen message:


INCORRECT BLOCK COUNT I=2529 

is shown as:


INCORRECT BLOCK COUNT I=inode-number

General fsck Error Messages

The error messages in this section may be displayed in any phase after initialization. Although they offer the option to continue, it is generally best to regard them as fatal. They reflect a serious system failure and should be handled immediately. When confronted with such a message, terminate the program by entering n(o). If you cannot determine what caused the problem, contact your local service provider or another qualified person.


CANNOT SEEK: BLK block-number (CONTINUE)

A request to move to a specified block number block-number in the file system failed. This message indicates a serious problem, probably a hardware failure.

If you want to continue the file system check, do a second run of fsck to recheck the file system. If the block was part of the virtual memory buffer cache, fsck will terminate with a fatal I/O error message.


CANNOT READ: BLK block-number (CONTINUE)

A request to read a specified block number in the file system failed. The message indicates a serious problem, probably a hardware failure. If you want to continue the file system check, fsck will retry the read and display a list of sector numbers that could not be read.

If fsck tries to write back one of the blocks on which the read failed it displays this message:


WRITING ZERO'ED BLOCK sector-numbers TO DISK

If the disk is experiencing hardware problems, the problem will persist. Run fsck again to recheck the file system. If the block was part of the virtual memory buffer cache, fsck terminates and displays this error message:


Fatal I/O error

CANNOT WRITE: BLK block-number (CONTINUE)

A request to write a specified block number block-number in the file system failed. The disk may be write-protected. Check the write-protect lock on the drive. If that is not the problem, contact your local service provider or another qualified person. If you continue the file system check, the write operation will be retried. Sectors that could not be written are shown with this message:


THE FOLLOWING SECTORS COULD NOT BE WRITTEN: sector-numbers

where sector-numbers indicates the sectors that could not be written. If the disk has hardware problems, the problem will persist. This error condition prevents a complete check of the file system. Run fsck a second time to recheck this file system. If the block was part of the virtual memory buffer cache, fsck terminates and displays this error message:


Fatal I/O error

Initialization Phase fsck Messages

In the initialization phase, command-line syntax is checked. Before the file system check can be performed, fsck sets up tables and opens files.

The messages in this section relate to error conditions resulting from command-line options, memory requests, the opening of files, the status of files, file system size checks, and the creation of the scratch file. All such initialization errors terminate fsck when it is preening the file system.


bad inode number inode-number to ginode

Reason Error Occurred 

How to Solve the Problem 

An internal error occurred because of a nonexistent inode inode-number. fsck exits.

Contact your local service provider or another qualified person. 


cannot alloc size-of-block map bytes for blockmap
 
cannot alloc size-of-free map bytes for freemap
 
cannot alloc size-of-state map bytes for statemap
 
cannot alloc size-of-lncntp bytes for lncntp

Reason Error Occurred 

How to Solve the Problem 

Request for memory for its internal tables failed. fsck terminates. This message indicates a serious system failure that should be handled immediately. This condition may occur if other processes are using a very large amount of system resources.

Killing other processes may solve the problem. If not, contact your local service provider or another qualified person. 


Can't open checklist file: filename

Reason Error Occurred 

How to Solve the Problem 

The file system checklist file filename (usually /etc/vfstab) cannot be opened for reading. fsck terminates.

Check if the file exists and if its access modes permit read access. 


Can't open filename

Reason Error Occurred 

How to Solve the Problem 

fsck cannot open file system filename. When running interactively, fsck ignores this file system and continues checking the next file system given.

Check to see if read and write access to the raw device file for the file system is permitted. 


Can't stat root

Reason Error Occurred 

How to Solve the Problem 

fsck request for statistics about the root directory failed. fsck terminates.

This message indicates a serious system failure. Contact your local service provider or another qualified person. 


Can't stat filename
Can't make sense out of name filename

Reason Error Occurred 

How to Solve the Problem 

fsck request for statistics about the file system filename failed. When running interactively, fsck ignores this file system and continues checking the next file system given.

Check if the file system exists and check its access modes. 


filename: (NO WRITE)

Reason Error Occurred 

How to Solve the Problem 

Either the -n option was specified or fsck could not open the file system filename for writing. When fsck is running in no-write mode, all diagnostic messages are displayed, but fsck does not attempt to fix anything.

If -n was not specified, check the type of the file specified. It may be the name of a regular file.


IMPOSSIBLE MINFREE=percent IN SUPERBLOCK (SET TO DEFAULT)

Reason Error Occurred 

How to Solve the Problem 

The superblock minimum space percentage is greater than 99 percent or less than 0 percent. 

To set the minfree parameter to the default 10 percent, type y at the default prompt. To ignore the error condition, type n at the default prompt.


INTERNAL INCONSISTENCY: message

Reason Error Occurred 

How to Solve the Problem 

fsck has had an internal error, whose message is message.

If one of the following messages are displayed, contact your local service provider or another qualified person:  

MAGIC NUMBER WRONG
NCG OUT OF RANGE
CPG OUT OF RANGE
NCYL DOES NOT JIBE WITH NCG*CPG
SIZE PREPOSTEROUSLY LARGE
TRASHED VALUES IN SUPER BLOCK

This message may be followed by an error in the following example:


filename: BAD SUPER BLOCK: block-number
USE AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION;
e.g., fsck[-f ufs] -o b=# [special ...]
where # is the alternate superblock.  See fsck_ufs(1M)

Reason Error Occurred 

How to Solve the Problem 

The superblock has been corrupted. 

Use an alternative superblock to supply needed information. Specifying block 32 is a good first choice. You can locate an alternative copy of the superblock by running the newfs -N command on the slice. Be sure to specify the -N option; otherwise, newfs overwrites the existing file system.


UNDEFINED OPTIMIZATION IN SUPERBLOCK (SET TO DEFAULT)

Reason Error Occurred 

How to Solve the Problem 

The superblock optimization parameter is neither OPT_TIME nor OPT_SPACE.

To minimize the time to perform operations on the file system, type y at the SET TO DEFAULT prompt. To ignore this error condition, type n.

Phase 1: Check Blocks and Sizes Messages

This phase checks the inode list. It reports error conditions encountered while:

All errors in this phase except INCORRECT BLOCK COUNT, PARTIALLY TRUNCATED INODE, PARTIALLY ALLOCATED INODE, and UNKNOWN FILE TYPE terminate fsck when it is preening a file system.

The other possible error messages displayed in this phase are referenced below.

These messages (in alphabetical order) may occur in phase 1:


block-number BAD I=inode-number

Reason Error Occurred 

How to Solve the Problem 

Inode inode-number contains a block number block-number with a number lower than the number of the first data block in the file system or greater than the number of the last block in the file system. This error condition may generate the EXCESSIVE BAD BLKS error message in phase 1 if inode inode-number has too many block numbers outside the file system range. This error condition generates the BAD/DUP error message in phases 2 and 4.

N/A


BAD MODE: MAKE IT A FILE?

Reason Error Occurred 

How to Solve the Problem 

The status of a given inode is set to all 1s, indicating file system damage. This message does not indicate physical disk damage, unless it is displayed repeatedly after fsck -y has been run.

Type y to reinitialize the inode to a reasonable value.


BAD STATE state-number TO BLKERR

Reason Error Occurred 

How to Solve the Problem 

An internal error has scrambled the fsck state map so that it shows the impossible value state-number. fsck exits immediately.

Contact your local service provider or another qualified person. 


block-number DUP I=inode-number

Reason Error Occurred 

How to Solve the Problem 

Inode inode-number contains a block number block-number, which is already claimed by the same or another inode. This error condition may generate the EXCESSIVE DUP BLKS error message in phase 1 if inode inode-number has too many block numbers claimed by the same or another inode. This error condition invokes phase 1B and generates the BAD/DUP error messages in phases 2 and 4.

N/A 


DUP TABLE OVERFLOW (CONTINUE)

Reason Error Occurred 

How to Solve the Problem 

There is no more room in an internal table in fsck containing duplicate block numbers. If the -o p option is specified, the program terminates.

To continue the program, type y at the CONTINUE prompt. When this error occurs, a complete check of the file system is not possible. If another duplicate block is found, this error condition repeats. Increase the amount of virtual memory available (by killing some processes, increasing swap space) and run fsck again to recheck the file system. To terminate the program, type n.


EXCESSIVE BAD BLOCKS I=inode-number (CONTINUE)

Reason Error Occurred 

How to Solve the Problem 

Too many (usually more than 10) blocks have a number lower than the number of the first data block in the file system or greater than the number of the last block in the file system associated with inode inode-number. If the -o p (preen) option is specified, the program terminates.

To continue the program, type y at the CONTINUE prompt. When this error occurs, a complete check of the file system is not possible. You should run fsck again to recheck the file system. To terminate the program, type n.


EXCESSIVE DUP BLKS I=inode-number (CONTINUE)

Reason Error Occurred 

How to Solve the Problem 

Too many (usually more than 10) blocks are claimed by the same or another inode or by a free-list. If the -o p option is specified, the program terminates.

To continue the program, type y at the CONTINUE prompt. When this error occurs, a complete check of the file system is not possible. You should run fsck again to recheck the file system. To terminate the program, type n.


INCORRECT BLOCK COUNT I=inode-number (number-of-BAD-DUP-or-missing-blocks should be 
number-of-blocks-in-filesystem) (CORRECT)

Reason Error Occurred 

How to Fix the Problem 

The block count for inode inode-number is number-of-BAD-DUP-or-missing-blocks, but should be number-of-blocks-in-filesystem. When preening, fsck corrects the count.

To replace the block count of inode inode-number by number-of-blocks-in-filesystem, type y at the CORRECT prompt. To terminate the program, type n.


LINK COUNT TABLE OVERFLOW (CONTINUE)

Reason Error Occurred 

How to Fix the Problem 

There is no more room in an internal table for fsck containing allocated inodes with a link count of zero. If the -o p (preen) option is specified, the program exits and fsck has to be completed manually.

To continue the program, type y at the CONTINUE prompt. If another allocated inode with a zero-link count is found, this error condition repeats. When this error occurs, a complete check of the file system is not possible. You should run fsck again to recheck the file system. Increase the virtual memory available by killing some processes or increasing swap space, then run fsck again. To terminate the program, type n.


PARTIALLY ALLOCATED INODE I=inode-number (CLEAR)

Reason Error Occurred 

How to Solve the Problem 

Inode inode-number is neither allocated nor unallocated. If the -o p (preen) option is specified, the inode is cleared.

To deallocate the inode inode-number by zeroing out its contents, type y. This may generate the UNALLOCATED error condition in phase 2 for each directory entry pointing to this inode.To ignore the error condition, type n. A no response is appropriate only if you intend to take other measures to fix the problem.


PARTIALLY TRUNCATED INODE I=inode-number (SALVAGE)

Reason Error Occurred 

How to Solve the Problem 

fsck has found inode inode-number whose size is shorter than the number of blocks allocated to it. This condition occurs only if the system crashes while truncating a file. When preening the file system, fsck completes the truncation to the specified size.

To complete the truncation to the size specified in the inode, type y at the SALVAGE prompt. To ignore this error condition, type n.


UNKNOWN FILE TYPE I=inode-number (CLEAR)

Reason Error Occurred 

How to Solve the Problem 

The mode word of the inode inode-number shows that the inode is not a pipe, special character inode, special block inode, regular inode, symbolic link, FIFO file, or directory inode. If the -o p option is specified, the inode is cleared.

To deallocate the inode inode-number by zeroing its contents, which results in the UNALLOCATED error condition in phase 2 for each directory entry pointing to this inode, type y at the CLEAR prompt. To ignore this error condition, type n.

Phase 1B: Rescan for More DUPS Messages

When a duplicate block is found in the file system, this message is displayed:


block-number DUP I=inode-number

Reason Error Occurred 

How to Solve the Problem 

Inode inode-number contains a block number block-number that is already claimed by the same or another inode. This error condition generates the BAD/DUP error message in phase 2. Inodes that have overlapping blocks may be determined by examining this error condition and the DUP error condition in phase 1.

When a duplicate block is found, the file system is rescanned to find the inode that previously claimed that block. 

Phase 2: Check Path Names Messages

This phase removes directory entries pointing to bad inodes found in phases 1 and 1B. It reports error conditions resulting from:

When the file system is being preened (-o p option), all errors in this phase terminate fsck, except those related to directories not being a multiple of the block size, duplicate and bad blocks, inodes out of range, and extraneous hard links.

Other possible error messages displayed in this phase are referenced below.


BAD INODE state-number TO DESCEND

Reason Error Occurred 

How to Solve the Problem 

An fsck internal error has passed an invalid state state-number to the routine that descends the file system directory structure. fsck exits.

If this error message is displayed, contact your local service provider or another qualified person. 


BAD INODE NUMBER FOR '.' I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose inode number for "." does not equal inode-number.

To change the inode number for "." to be equal to inode-number, type y at the FIX prompt To leave the inode numbers for "." unchanged, type n.


BAD INODE NUMBER FOR '..' I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose inode number for ".." does not equal the parent of inode-number.

To change the inode number for ".." to be equal to the parent of inode-number, type y at the FIX prompt. (Note that "..'' in the root inode points to itself.)To leave the inode number for ".." unchanged, type n.


BAD RETURN STATE state-number FROM DESCEND

Reason Error Occurred 

How to Solve the Problem 

An fsck internal error has returned an impossible state state-number from the routine that descends the file system directory structure. fsck exits.

If this message is displayed, contact your local service provider or another qualified person. 


BAD STATE state-number FOR ROOT INODE

Reason Error Occurred 

How to Solve the Problem 

An internal error has assigned an impossible state state-number to the root inode. fsck exits.

If this error message is displayed, contact your local service provider or another qualified person. 


BAD STATE state-number FOR INODE=inode-number

Reason Error Occurred 

How to Solve the Problem 

An internal error has assigned an impossible state state-number to inode inode-number. fsck exits.

If this error message is displayed, contact your local service provider or another qualified person. 


DIRECTORY TOO SHORT I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory filename has been found whose size file-size is less than the minimum directory size. The owner UID, mode file-mode, size file-size, modify time modification-time, and directory name filename are displayed.

To increase the size of the directory to the minimum directory size, type y at the FIX prompt. To ignore this directory, type n.


DIRECTORY filename: LENGTH file-size NOT MULTIPLE OF block-number (ADJUST)

Reason Error Occurred 

How to Solve the Problem 

A directory filename has been found with size file-size that is not a multiple of the directory block size block-number.

To round up the length to the appropriate block size, type y. When preening the file system (-o p option), fsck only displays a warning and adjusts the directory.To ignore this condition, type n.


DIRECTORY CORRUPTED I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename (SALVAGE)

Reason Error Occurred 

How to Solve the Problem 

A directory with an inconsistent internal state has been found. 

To throw away all entries up to the next directory boundary (usually a 512-byte boundary), type y at the SALVAGE prompt. This drastic action can throw away up to 42 entries. Take this action only after other recovery efforts have failed. To skip to the next directory boundary and resume reading, but not modify the directory, type n.


DUP/BAD I=inode-number OWNER=O MODE=M SIZE=file-size 
MTIME=modification-time TYPE=filename (REMOVE)

Reason Error Occurred 

How to Solve the Problem 

Phase 1 or phase 1B found duplicate blocks or bad blocks associated with directory or file entry filename, inode inode-number. The owner UID, mode file-mode, size file-size, modification time modification-time, and directory or file name filename are displayed. If the -p (preen) option is specified, the duplicate/bad blocks are removed.

To remove the directory or file entry filename, type y at the REMOVE prompt. To ignore this error condition, type n.


DUPS/BAD IN ROOT INODE (REALLOCATE)

Reason Error Occurred 

How to Solve the Problem 

Phase 1 or phase 1B has found duplicate blocks or bad blocks in the root inode (usually inode number 2) of the file system. 

To clear the existing contents of the root inode and reallocate it, type y at the REALLOCATE prompt. The files and directories usually found in the root will be recovered in phase 3 and put into the lost+found directory. If the attempt to allocate the root fails, fsck will exit with: CANNOT ALLOCATE ROOT INODE. Type n to get the CONTINUE prompt. Type: y to respond to the CONTINUE prompt, and ignore the DUPS/BAD error condition in the root inode and continue running the file system check. If the root inode is not correct, this may generate many other error messages. Type n to terminate the program.


EXTRA '.' ENTRY I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found that has more than one entry for ".".

To remove the extra entry for "." type y at the FIX prompt. To leave the directory unchanged, type n.


EXTRA '..' ENTRY I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-timeDIR=filename(FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found that has more than one entry for ".." (the parent directory).

To remove the extra entry for `..' (the parent directory), type y at the FIX prompt. To leave the directory unchanged, type n.


hard-link-number IS AN EXTRANEOUS HARD LINK TO A DIRECTORY filename (REMOVE)

Reason Error Occurred 

How to Solve the Problem 

fsck has found an extraneous hard link hard-link-number to a directory filename. When preening (-o p option), fsck ignores the extraneous hard links.

To delete the extraneous entry hard-link-number type y at the REMOVE prompt. To ignore the error condition, type n.


inode-number OUT OF RANGE I=inode-number NAME=filename (REMOVE)

Reason Error Occurred 

How to Solve the Problem 

A directory entry filename has an inode number inode-number that is greater than the end of the inode list. If the -p (preen) option is specified, the inode will be removed automatically.

To delete the directory entry filename type y at the REMOVE prompt. To ignore the error condition, type n.


MISSING '.' I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose first entry (the entry for ".") is unallocated.

To build an entry for "." with inode number equal to inode-number, type y at the FIX prompt. To leave the directory unchanged, type n.


MISSING '.' I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename CANNOT FIX, FIRST ENTRY IN 
DIRECTORY CONTAINS filename

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose first entry is filename. fsck cannot resolve this problem.

Mount the file system and move entry filename elsewhere. Unmount the file system and run fsck again.


MISSING '.' I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename CANNOT FIX, INSUFFICIENT 
SPACE TO ADD '.'

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose first entry is not ".". fsck cannot resolve the problem.

If this error message is displayed, contact your local service provider or another qualified person. 


MISSING '..' I=inode-number OWNER=UIDMODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename (FIX)

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose second entry is unallocated.

To build an entry for ".." with inode number equal to the parent of inode-number, type y at the FIX prompt. (Note that "..'' in the root inode points to itself.) To leave the directory unchanged, type n.


MISSING '..' I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename CANNOT FIX, SECOND ENTRY IN 
DIRECTORY CONTAINS filename

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose second entry is filename. fsck cannot resolve this problem.

Mount the file system and move entry filename elsewhere. Then unmount the file system and run fsck again.


MISSING '..' I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time DIR=filename CANNOT FIX, INSUFFICIENT SPACE 
TO ADD '..'

Reason Error Occurred 

How to Solve the Problem 

A directory inode-number has been found whose second entry is not ".." (the parent directory). fsck cannot resolve this problem.

Mount the file system and move the second entry in the directory elsewhere. Then unmount the file system and run fsck again.


NAME TOO LONG filename

Reason Error Occurred 

How to Solve the Problem 

An excessively long path name has been found, which usually indicates loops in the file system name space. This error can occur if a privileged user has made circular links to directories.  

Remove the circular links. 


ROOT INODE UNALLOCATED (ALLOCATE)

Reason Error Occurred 

How to Solve the Problem 

The root inode (usually inode number 2) has no allocate-mode bits. 

To allocate inode 2 as the root inode, type y at the ALLOCATE prompt. The files and directories usually found in the root will be recovered in phase 3 and put into the lost+found directory. If the attempt to allocate the root fails, fsck displays this message and exits: CANNOT ALLOCATE ROOT INODE. To terminate the program, type n.


ROOT INODE NOT DIRECTORY (REALLOCATE)

Reason Error Occurred 

How to Solve the Problem 

The root inode (usually inode number 2) of the file system is not a directory inode. 

To clear the existing contents of the root inode and reallocate it, type y at the REALLOCATE prompt. The files and directories usually found in the root will be recovered in phase 3 and put into the lost+found directory. If the attempt to allocate the root fails, fsck displays this message and exits :CANNOT ALLOCATE ROOT INODE. To have fsck prompt with FIX, type n.


UNALLOCATED I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time type=filename(REMOVE)

Reason Error Occurred 

How to Solve the Problem 

A directory or file entry filename points to an unallocated inode inode-number. The owner UID, mode file-mode, size file-size, modify time modification-time, and file name filename are displayed.

To delete the directory entry filename, type y at the REMOVE prompt. To ignore the error condition, type n.


ZERO LENGTH DIRECTORY I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time DIR=filename ] (REMOVE)

Reason Error Occurred 

How to Solve the Problem 

A directory entry filename has a size file-size that is zero. The owner UID, mode file-mode, size file-size, modify time modification-time, and directory name filename are displayed.

To remove the directory entry filename, type y at the REMOVE prompt. This results in the BAD/DUP error message in phase 4. To ignore the error condition, type n.

Phase 3: Check Connectivity Messages

This phase checks the directories examined in phase 2 and reports error conditions resulting from:

Other possible error messages displayed in this phase are referenced below.


BAD INODE state-number TO DESCEND

Reason Error Occurred 

How to Solve the Problem 

An internal error has caused an impossible state state-number to be passed to the routine that descends the file system directory structure. fsck exits.

If this occurs, contact your local service provider or another qualified person. 


DIR I=inode-number1 CONNECTED. PARENT WAS I=inode-number2

Reason Error Occurred 

How to Solve the Problem 

This is an advisory message indicating a directory inode inode-number1 was successfully connected to the lost+found directory. The parent inode inode-number2 of the directory inode inode-number1 is replaced by the inode number of the lost+found directory.

N/A 


DIRECTORY filename LENGTH file-size NOT MULTIPLE OF block-number (ADJUST)

Reason Error Occurred 

How to Solve the Problem 

A directory filename has been found with size file-size that is not a multiple of the directory block size B. (This condition can recur in phase 3 if it is not adjusted in phase 2.)

To round up the length to the appropriate block size, type y at the ADJUST prompt. When preening, fsck displays a warning and adjusts the directory. To ignore this error condition, type n.


lost+found IS NOT A DIRECTORY (REALLOCATE)

Reason Error Occurred 

How to Solve the Problem 

The entry for lost+found is not a directory.

To allocate a directory inode and change the lost+found directory to reference it, type y at the REALLOCATE prompt. The previous inode reference by the lost+found directory is not cleared and it will either be reclaimed as an unreferenced inode or have its link count adjusted later in this phase. Inability to create a lost+found directory displays the message: SORRY. CANNOT CREATE lost+found DIRECTORY and aborts the attempt to link up the lost inode, which generates the UNREF error message in phase 4. To abort the attempt to link up the lost inode, which generates the UNREF error message in phase 4, type n.


NO lost+found DIRECTORY (CREATE)

Reason Error Occurred 

How to Solve the Problem 

There is no lost+found directory in the root directory of the file system. When preening, fsck tries to create a lost+found directory.

To create a lost+found directory in the root of the file system, type y at the CREATE prompt. This may lead to the message NO SPACE LEFT IN / (EXPAND). If the lost+found directory cannot be created, fsck displays the message: SORRY. CANNOT CREATE lost+found DIRECTORY and aborts the attempt to link up the lost inode. This in turn generates the UNREF error message later in phase 4. To abort the attempt to link up the lost inode, type n.


NO SPACE LEFT IN /lost+found (EXPAND)

Reason Error Occurred 

How to Solve the Problem 

Another entry cannot be added to the lost+found directory in the root directory of the file system because no space is available. When preening, fsck expands the lost+found directory.

To expand the lost+found directory to make room for the new entry, type y at the EXPAND prompt. If the attempted expansion fails, fsck displays: SORRY. NO SPACE IN lost+found DIRECTORY and aborts the request to link a file to the lost+found directory. This error generates the UNREF error message later in phase 4. Delete any unnecessary entries in the lost+found directory. This error terminates fsck when preening is in effect. To abort the attempt to link up the lost inode, type n.


UNREF DIR I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time (RECONNECT)

Reason Error Occurred 

How to Solve the Problem 

The directory inode inode-number was not connected to a directory entry when the file system was traversed. The owner UID, mode file-mode, size file-size, and modification time modification-time of directory inode inode-number are displayed. When preening, fsck reconnects the non-empty directory inode if the directory size is non-zero. Otherwise, fsck clears the directory inode.

To reconnect the directory inode inode-number into the lost+found directory, type y at the RECONNECT prompt. If the directory is successfully reconnected, a CONNECTED message is displayed. Otherwise, one of the lost+found error messages is displayed. To ignore this error condition, type n. This error causes the UNREF error condition in phase 4.

Phase 4: Check Reference Counts Messages

This phase checks the link count information obtained in phases 2 and 3. It reports error conditions resulting from:

All errors in this phase (except running out of space in the lost+found directory) are correctable when the file system is being preened.

Other possible error messages displayed in this phase are referenced below.


BAD/DUP type I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time (CLEAR)

Reason Error Occurred 

How to Solve the Problem 

Phase 1 or phase 1B found duplicate blocks or bad blocks associated with file or directory inode inode-number. The owner UID, mode file-mode, size file-size, and modification time modification-time of inode inode-number are displayed.

To deallocate inode inode-number by zeroing its contents, type y at the CLEAR prompt. To ignore this error condition, type n.


(CLEAR)

Reason Error Occurred 

How to Solve the Problem 

The inode mentioned in the UNREF error message immediately preceding cannot be reconnected. This message does not display if the file system is being preened because lack of space to reconnect files terminates fsck.

To deallocate the inode by zeroing out its contents, type y at the CLEAR prompt. To ignore the preceding error condition, type n.


LINK COUNT type I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size
MTIME=modification-time COUNT link-count SHOULD BE 
corrected-link-count (ADJUST)

Reason Error Occurred 

How to Solve the Problem 

The link count for directory or file inode inode-number is link-count but should be corrected-link-count. The owner UID, mode file-mode, size file-size, and modification time modification-time of inode inode-number are displayed. If the -o p option is specified, the link count is adjusted unless the number of references is increasing. This condition does not occur unless there is a hardware failure. When the number of references is increasing during preening, fsck displays this message and exits: LINK COUNT INCREASING

To replace the link count of directory or file inode inode-number with corrected-link-count, type y at the ADJUST prompt. To ignore this error condition, type n.


lost+found IS NOT A DIRECTORY (REALLOCATE)

Reason Error Occurred 

How to Solve the Problem 

The entry for lost+found is not a directory.

To allocate a directory inode and change the lost+found directory to reference it, type y at the REALLOCATE prompt. The previous inode reference by the lost+found directory is not cleared. It will either be reclaimed as an unreferenced inode or have its link count adjusted later in this phase. Inability to create a lost+found directory displays this message: SORRY. CANNOT CREATE lost+found DIRECTORY and aborts the attempt to link up the lost inode. This error generates the UNREF error message later in phase 4. To abort the attempt to link up the lost inode, type n.


NO lost+found DIRECTORY (CREATE)

Reason Error Occurred 

How to Solve the Problem 

There is no lost+found directory in the root directory of the file system. When preening, fsck tries to create a lost+found directory.

To create a lost+found directory in the root of the file system, type y at the CREATE prompt. If the lost+found directory cannot be created, fsck displays the message: SORRY. CANNOT CREATE lost+found DIRECTORY and aborts the attempt to link up the lost inode. This error in turn generates the UNREF error message later in phase 4. To abort the attempt to link up the lost inode, type n.


NO SPACE LEFT IN / lost+found (EXPAND)

Reason Error Occurred 

How to Solve the Problem 

There is no space to add another entry to the lost+found directory in the root directory of the file system. When preening, fsck expands the lost+found directory.

To expand the lost+found directory to make room for the new entry, type y at the EXPAND prompt. If the attempted expansion fails, fsck displays the message: SORRY. NO SPACE IN lost+found DIRECTORY and aborts the request to link a file to the lost+found directory. This error generates the UNREF error message later in phase 4. Delete any unnecessary entries in the lost+found directory. This error terminates fsck when preening (-o p option) is in effect.To abort the attempt to link up the lost inode, type n.


UNREF FILE I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time (RECONNECT)

Reason Error Occurred 

How to Solve the Problem 

File inode inode-number was not connected to a directory entry when the file system was traversed. The owner UID, mode file-mode, size file-size, and modification time modification-time of inode inode-number are displayed. When fsck is preening, the file is cleared if either its size or its link count is zero; otherwise, it is reconnected.

To reconnect inode inode-number to the file system in the lost+found directory, type y. This error may generate the lost+found error message in phase 4 if there are problems connecting inode inode-number to the lost+found directory. To ignore this error condition, type n. This error always invokes the CLEAR error condition in phase 4.


UNREF type I=inode-number OWNER=UID MODE=file-mode SIZE=file-size 
MTIME=modification-time (CLEAR)

Reason Error Occurred 

How to Solve the Problem 

Inode inode-number (whose type is directory or file) was not connected to a directory entry when the file system was traversed. The owner UID, mode file-mode, size file-size, and modification time modification-time of inode inode-number are displayed. When fsck is preening, the file is cleared if either its size or its link count is zero; otherwise, it is reconnected.

To deallocate inode inode-number by zeroing its contents, type y at the CLEAR prompt. To ignore this error condition, type n.


ZERO LENGTH DIRECTORY I=inode-number OWNER=UID MODE=file-mode 
SIZE=file-size MTIME=modification-time(CLEAR)

Reason Error Occurred 

How to Solve the Problem 

A directory entry filename has a size file-size that is zero. The owner UID, mode file-mode, size file-size, modification time modification-time, and directory name filename are displayed.

To deallocate the directory inode inode-number by zeroing out its contents, type y. To ignore the error condition, type n.

Phase 5: Check Cylinder Groups Messages

This phase checks the free-block and used-inode maps. It reports error conditions resulting from:

The possible error messages displayed in this phase are referenced below.


BLK(S) MISSING IN BIT MAPS (SALVAGE)

Reason Error Occurred 

How to Solve the Problem 

A cylinder group block map is missing some free blocks. During preening, fsck reconstructs the maps.

To reconstruct the free-block map, type y at the SALVAGE prompt. To ignore this error condition, type n.


CG character-for-command-option: BAD MAGIC NUMBER

Reason Error Occurred 

How to Solve the Problem 

The magic number of cylinder group character-for-command-option is wrong. This error usually indicates that the cylinder group maps have been destroyed. When running interactively, the cylinder group is marked as needing reconstruction. fsck terminates if the file system is being preened.

If this occurs, contact your local service provider or another qualified person. 


FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGE)

Reason Error Occurred 

How to Solve the Problem 

The actual count of free blocks does not match the count of free blocks in the superblock of the file system. If the -o p option was specified, the free-block count in the superblock is fixed automatically.

To reconstruct the superblock free-block information, type y at the SALVAGE prompt. To ignore this error condition, type n.


SUMMARY INFORMATION BAD (SALVAGE)

Reason Error Occurred 

How to Solve the Problem 

The summary information is incorrect. When preening, fsck recomputes the summary information.

To reconstruct the summary information, type y at the SALVAGE prompt. To ignore this error condition, type n.

Cleanup Phase Messages

Once a file system has been checked, a few cleanup functions are performed. The cleanup phase displays the following status messages.


number-of files, number-of-files
used, number-of-files free (number-of frags, number-of blocks, 
fpercentfragmentation)

This message indicates that the file system checked contains number-of files using number-of fragment-sized blocks, and that there are number-of fragment-sized blocks free in the file system. The numbers in parentheses break the free count down into number-of free fragments, number-of free full-sized blocks, and the percent fragmentation.


***** FILE SYSTEM WAS MODIFIED *****

This message indicates that the file system was modified by fsck. If this file system is mounted or is the current root (/) file system, reboot. If the file system is mounted, you may need to unmount it and run fsck again; otherwise, the work done by fsck may be undone by the in-core copies of tables.


filename FILE SYSTEM STATE SET TO OKAY

This message indicates that file system filename was marked as stable. Use the fsck -m command to determine if the file system needs checking.


filename FILE SYSTEM STATE NOT SET TO OKAY

This message indicates that file system filename was not marked as stable. Use the fsck -m command to determine if the file system needs checking.

Chapter 74 Troubleshooting Software Administration Problems

This chapter describes problems you may encounter when installing or removing software packages. There are two sections: Specific Software Administration Errors, which describes package installation and administration errors you might encounter, and General Software Administration Problems, which describes behavioral problems that might not result in a particular error message.

This is a list of information in this chapter.

See Chapter 16, Software Administration (Overview) for information about managing software packages.

Specific Software Administration Errors


WARNING: filename <not present on Read Only file system>
 

Reason Error Occurred 

How to Fix the Problem 

This error message indicates that not all of a package's files could be installed. This usually occurs when you are using pkgadd to install a package on a client. In this case, pkgadd attempts to install a package on a file system that is mounted from a server, but pkgadd doesn't have permission to do so.

If you see this warning message during a package installation, you must also install the package on the server. See "How to Add Packages to a Server" on page 308 for details. 

General Software Administration Problems

Reason Error Occurred 

How to Fix the Problem 

There is a known problem with adding or removing some packages developed prior to the Solaris 2.5 release. Sometimes, when adding or removing these packages, the installation fails during user interaction or you are prompted for user interaction and your responses are ignored.  

Set the following environment variable and try to add the package again. 

NONABI_SCRIPTS=TRUE