Solaris Common Messages and Troubleshooting Guide

"C"

Cannot access a needed shared library

Cause

Trying to exec(2) an a.out that requires a static shared library and the static shared library doesn't exist or the user doesn't have permission to use it.

Technical Notes

The symbolic name for this error is ELIBACC, errno=83.

Cannot allocate colormap entry for "string"

Cause

This message from libXt (X Intrinsics library) indicates that the system colormap was full even before the color name specified in quotes was requested. Some applications can continue after this message. Other applications, such as Workspace Properties Color, fail to come up when the colormap is full.

Action

Exit the programs that make heavy use of the colormap, then restart the failed application and try again.

Cannot assign requested address

Cause

Results from an attempt to create a transport endpoint with an address not on the current machine.

Technical Notes

The symbolic name for this error is EADDRNOTAVAIL, errno=126.

cannot change passwd, not correct passwd

Cause

While running yppasswd(1) and trying to change a user's password, it responded with the message cannot change passwd, not correct passwd.

Also getting yppasswd user string does not exist on server console, but running ypcat passwd | grep (user) returns the username. Verified that yppasswdd(1M) is running.

Action

Check the passwd(4) file with pwck(1M) and verify that yppasswdd(1M) is running on the right server. Then verify where the passwd(4) file is located and if changed, check that yppasswdd(1M) has the location in the process line. The password located in /etc/yp should read /usr/lib/yp/rpc.yppasswdd -D /etc/yp. The -D option with the passwd files directory location tells yppasswdd(1M) where to update and verify password changes.

Cannot exec a shared library directly

Cause

Attempting to exec(2) a shared library directly.

Technical Notes

The symbolic name for this error is ELIBEXEC, errno=87.

Cannot find SERVER hostname in network database

Cause

A brief description: the user is on a different subnet and is running permanent licenses:


ultra1(50)% cc -o hello hello.c
License Error : Cannot find the license server (fry)
in the network database for product(Sun WorkShop Compiler C)
Cannot find SERVER hostname in network database (-14,7)
cc: acomp failed for hello.c
ultra1(51)%

Action

First, make sure that the server is up and running. Second, make sure that the server is in the /etc/hosts file of the client system by typing: ping server name. Third, check the license daemon on teh server to see if it is running. Fourth, make sure that there is an elementary license file on the client:


cd /etc/opt/licenses
more sunpro.loc
Fifth, in the sunpro.loc directory are there only text license files such as sunpro.lic,1? Sixth, on the client check:

 % cd /etc
 % more nsswitch.conf | grep hosts
 hosts:      nis [NOTFOUND=return] files
This says that it will use the nis server to look up the IP address. If it is set first for nis and the /etc/hosts file has the server listed by name, change the line to

hosts:      files nis 
Then see if it can find it. If not, try truss and snoop to see what is going on.

Cannot open FCC file

Cause

When trying to send mail via Netscape, the message is displayed. The problem is that Netscape is trying to save the outbound message to a file that has been specified by the user, but does not exist.

Action

To correct this problem: go to options Mail and News Preferences, then go to Compose. A template pops up. There is a section that specifies where to save out-going mail and news files. Make sure that these files exist or remove them from the template, if you do not care about logging what messages are sent via Netscape.

Cannot send after transport endpoint shutdown

Cause

A request to send data was disallowed because the transport endpoint has already been shut down.

Technical Notes

The symbolic name for this error is ESHUTDOWN, errno=143.

Can't create public message device (Device busy)

Cause

This message comes from the lp(1) print scheduler, indicating that it is either extremely busy or hung.

Action

If print jobs are coming out of the printer in question, wait until they are finished and then resubmit this print job. If you see this message again, the lp(1) system is probably hung.

See the message "lp hang" for a procedure to clear the queue.

Technical Notes

If lp(1) is unable to create a device for printer messages, the message FIFO could be already in use, or locked by another print job.

See Also

For more information on the print scheduler, see the section on administrating printers in the System Administration Guide Volume II.

Can't invoke /etc/init, error int

Cause

This message can appear while a system is booting, indicating that the init(1M) program is missing or corrupted. Note that /etc/init is a symbolic link to /sbin/init.

Action

Boot the miniroot so you can replace init(1M). Halt the machine by typing Stop-A or by pressing the reset button. Reboot single-user from CD-ROM, the net, or diskette. For example, type boot cdrom -s at the ok prompt to boot from CD-ROM. After the system comes up and gives you a # prompt, mount the device corresponding to the original / partition somewhere, with a command similar to the mount(1M) command below. Then copy the init(1M) program from the miniroot to the original / partition, and reboot the system.


# mount /dev/dsk/c0t3d0s0 /mnt
# cp /sbin/init /mnt/sbin/init
# reboot
If this doesn't work, other files might be corrupted, and you might need to reinstall the entire system.

Technical Notes

The error number is 2 if /sbin/init is missing, or 8 if /sbin/init has an incorrect executable format. This is usually followed by a "panic: icode" message. The system tries to reboot itself, but goes into a loop, because rebooting is impossible without init(1M).

See Also

For more information on booting the system, see the section on halting and booting the system in the System Administration Guide, Volume I.

can't open /dev/rdsk/string: (null): UNEXPECTED INCONSISTENCY

Cause

On SunOS 4.1.x, this message indicated that the device containing the /dev filesystem has become disconnected. Solaris behavior has not been confirmed.

can't synchronize with hayes

Cause

This message sometimes appears when using a modem that the system regards as a "Hayes" type modem, which includes most modems manufactured today. The message can be caused by incorrect switch settings, by poor cable connections, or by not turning the modem on.

Action

Check that the modem is on and that the cables between the modem and your system are securely connected. Check the internal and external modem switch settings. Turn the modem off and then on again, if necessary.

cd: Too many arguments

Cause

The C shell's cd(1) command takes only one argument. Either more than one directory was specified, or a directory name containing a space was specified. Directory names with spaces are easy to create with File Manager.

Action

Use only one directory name. To change to a directory whose name contains spaces, enclose the directory name in double (") or single (') quotes, or use File Manager.

Channel number out of range

Cause

The system has run out of stream devices. This error results when a stream head attempts to open a minor device that does not exist or that is currently in use.

Action

Check that the stream device in question exists and was created with an appropriate number of minor devices. Make sure that the hardware corresponds to this configuration. If the stream device configuration is correct, try again later when more system resources might be available.

Technical Notes

The symbolic name for this error is ECHRNG, errno=37.

chmod: ERROR: invalid mode

Cause

This message from the chmod(1) command indicates a problem in the first non-option argument.

Action

If you are specifying a numeric file mode, you can provide any number of digits (although only the final one to four are considered), but all digits must be between 0 and 7. If you are specifying a symbolic file mode, use the syntax provided in the chmod(1) usage message to avoid the "invalid mode" error message:

Usage: chmod [ugoa][+-=][rwxlstugo] file ...
Some combinations of symbolic keyletters produce no error message but fail to have any effect. The first group, [ugoa], is truly optional. The second group, [+-=], is mandatory for chmod(1) to have an effect. The third group, [rwxlstugo], is also mandatory for effect, and can be used in combination when that combination does not conflict.

Command not found

Cause

The C shell could not find the program you gave as a command.

Action

Check the form and spelling of the command line. If that looks correct, echo $path to see if the user's search path is correct. When communications are garbled, it is possible to unset a search path to such an extent that only built-in shell commands are available. Here is a command to reset a basic search path:


 % set path = (/usr/bin /usr/ccs/bin /usr/openwin/bin .)
If the search path looks correct, check the directory contents along the search path to see if programs are missing or if directories are not mounted.

See Also

For more information about the C shell, see csh(1).

Communication error on send

Cause

This error occurs when the current process is waiting for a message from a remote machine, but the link connecting the machines breaks.

Technical Notes

The symbolic name for this error is ECOMM, errno=70.

Connection closed.

Cause

This message can appear when using rlogin(1) to another system if the remote host cannot create a process for this user, if the user takes too long to type the correct password, if the user interrupts the network connection, or if the remote host goes down. Data loss is possible if files were modified and not saved before the connection closed.

Action

Just try again. If the other system has gone down, wait for it to reboot first.

Connection closed by foreign host.

Cause

When a user telnet(1)s to another system, this message can appear if the user takes too long to type the correct password, if the remote host cannot create a login for this user, or if the remote host goes down or terminates the connection. Data loss is possible if files were modified and not saved before the connection closed.

Action

Just try again. If the other system has gone down, wait for it to reboot first.

[Connection closed. Exiting]

Cause

After using the talk(1) command to communicate with another user, the other person enters an interrupt (usually Control-c), and this message appears on your screen.

Action

Sending an interrupt like this is the usual way of exiting the talk program. The talk(1) session is over and you can return to your work.

Connection refused

Cause

No connection could be made because the target machine actively refused it. This happens either when trying to connect to an inactive service or when a service process is not present at the requested address.

Action

Activate the service on the target machine, or start it up again if it has disappeared. If for security reasons you do not intend to provide this service, inform the user community, possibly suggesting an alternative.

Technical Notes

The symbolic name for this error is ECONNREFUSED, errno=146.

Connection reset by peer

Cause

A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote host due to a timeout or a reboot.

Technical Notes

The symbolic name for this error is ECONNRESET, errno=131.

Connection timed out

Cause

This occurs either when the destination host is down or when problems in the network cause lost transmission.

Action

First check the operation of the host system, for example by using ping(1M) and ftp(1), then repair or reboot as necessary. If that doesn't solve the problem, check the network cabling and connections.

Technical Notes

No connection was established in a specified time. A connect or send request failed because the destination host did not properly respond after a reasonable interval. (The timeout period is dependent on the communication protocol.)

The symbolic name for this error is ETIMEDOUT, errno=145.

console login: ^J^M^Q^K^K^P

Cause

This usually occurs because OpenWindows exited abnormally, leaving the system's keyboard in the wrong mode. The characters that appear when someone attempts to login are garbage transliterations of what someone types.

Action

On a SPARC system: find another machine and remote login to this system, then run this command:


$ /usr/openwin/bin/kbd_mode -a
This puts the console back into ASCII mode. Note that kbd_mode is not a windows program, it just fixes the console mode.

On an x86 system: log in remotely and start, then kill the X server, or reboot the system.

Technical Notes

The usual reason for this problem occurring is an automated script run from cron(1M) that clears out the /tmp directory every so often. Ensure that any such scripts do not remove the /tmp/.X11-pipe or /tmp/.X11-unix directories, or any files in them.

core dumped

Cause

A core(4) file contains an image of memory at the time of software failure, and is used by programmers to find the reason for the failure.

Action

To see which program produced a core(4) file, run either the file(1) command or the adb(1)(1) command. The following examples show the output of the file(1) and adb(1) commands on a core file from the dtmail program.


$ file core
core: ELF 32-bit MSB core file SPARC Version 1, from `dtmail'

$ adb core
core file = core -- program `dtmail'
SIGSEGV  11: segmentation violation
^D      (use Control-d to quit the program)
Ask the vendor or author of this program for a debugged version.

Technical Notes

Some signals, such as SIGQUIT, SIGBUS, and SIGSEGV, produce a core dump. See the signal(5) man page for a complete list.

If you have the source code for the program, you can try compiling it with cc -g, and debugging it yourself using dbx or a similar debugger. The where directive of dbx provides a stack trace.

On mixed networks, it can be difficult to discern which machine architecture produced a particular core dump, since adb(1) on one type of system generally cannot read a core(4) file from another type of system, and will produce an "unrecognized file" message. Run adb(1) on various machine architectures until you find the right one.

The term "core" is archaic-- ferrite core memory was supplanted by silicon RAM in the 1970s, although spaceships still employ core memory for its imperviousness to radiation.

See Also

For information on saving and viewing crash information see the System Administration Guide, Volume II. If you are using the AnswerBook, "system crash" is a good search string.

corrupt label - wrong magic number or corrupt label - label checksum failed

Cause

After a power cycle, the machine comes up with error messages saying: corrupt label - label checksum failed or corrupt label - wrong magic number . format(1M) showed:


  0 unassigned    wm       0               0         (0/0/0)          0
  1 unassigned    wm       0               0         (0/0/0)          0
  2     backup    wm       0 - 5460        4.2G    (5460/0/0)   4154160
  3 unassigned    wm       0               0         (0/0/0)          0
  4 unassigned    wm       0               0         (0/0/0)          0
  5 unassigned    wm       0               0         (0/0/0)          0
  6 unassigned    wm       0 - 2730       2.1G       (0/0/0)          0
  7 unassigned 	  wm       2730-5460      2.1G       (0/0/0)          0
The disks were using raw partitions beginning at block 0 (cylinder 0).

The disk label (VTOC) is kept on the block 0 of cylinder 0. The label will eventually get overwritten by database programs using raw partitions if the raw partition begins at cylinder 0. (Unix filesystems avoid this area of the partition.)

Action

The workaround is to go into format and get the backup label using the backup command. Relabel the disk using this backup label. You should then be able to access the disk.

Backup the data on this disk.

Go back to the disk and relabel it, starting the raw partition at cylinder 1 (This looses one cylinder, but prevents corrupting the VTOC).

Label again.

Restore the data from your backup.

could not grant slave pty

Cause

User gets the error message: could not grant slave pty when attempting a telnet(1), rlogin(1), or rsh(1) session (anything that requires a shell) or when trying to bring up an x-term.

Action

The user's file permissions were set wrong on /usr/lib/pt_chmod. The user had:


# ls -la /usr/lib/pt_chmod
---s--x--x   1 bin     bin         3120 May  3  1996
The permissions should be:

# ls -la /usr/lib/pt_chmod
---s--x--x   1 root     bin         3120 May  3  1996
Note that the owner should be root, user had bin as the owner. Also note that the setuid bit must be set. Once the user did a chown root pt_chmod, everything worked again.

Could not initialize tooltalk (tt_open): TT_ERR_NOMP

Cause

Various desktop tools display or print this message when the ttsession(1) process is not available. The TookTalk service generally tries to restart ttsession(1) if it is not running. So this error indicates that the ToolTalk service is either not installed or is not installed correctly.

Action

Verify that the ttsession(1) command exists in /usr/openwin/bin or /usr/dt/bin. If this command is not present, ToolTalk is not installed correctly. The packages constituting ToolTalk are the runtime SUNWtltk, developer support SUNWtltkd, and the manual pages SUNWtltkm. CDE ToolTalk packages have the same names with ".2" appended.

Technical Notes

The full TT_ERR_NOMP message string reads as follows: "No ttsession(1) is running, probably because tt_open(3) has not been called yet. If this is returned from tt_open(3) it means ttsession(1) could not be started, which generally means ToolTalk is not installed on the system."

Could not open ToolTalk Channel

Cause

Attempting to run workshop remotely, the error message is displayed.

The fix is the following: 1. Make sure workshop is no longer running; 2. In the telnet/rlogin session window, type /bin/ps -ef | grep ttsession. If there is one running that belongs to the user who has telneted into the system, type kill pid_of_ttsession; 3. In the telnet rlogin session, type /usr/dt/bin/ttsession -s -d <machine_telnetting_from>:0.0; 4. Start workshop.

Could not start new viewer

Cause

This message appears in the AnswerBook navigator window, along with an XView error message on the console.

Action

See the message "answerbook: XView error: NULL pointer passed to xv_set" for details.

cpio: Bad magic number/header.

Cause

A cpio(1) archive has either become corrupted or was written out with an incompatible version of cpio(1).

Action

Use the -k option to cpio(1) to skip I/O errors and corrupted file headers. This might permit you to extract other files from the cpio(1) archive. To extract files with corrupted headers, try editing the archive with a binary editor such as emacs(1). Each cpio(1) file header contains a filename as a string.

See Also

For more information on magic numbers, see magic(4).

cpio : can't read input : end of file encountered prior to expected end of archive.

Cause

When we try to read a multivolume floppy in bar format using the following command:


  # cpio -id -H bar -I /dev/diskette0
It fails with the message.

Action

Kill /usr/sbin/vold by running /etc/init.d/volmgt stop and use the device name /dev/rfd0

Cross-device link

Cause

An attempt was made to make a hard link to a file on another device, such as on another filesystem.

Action

Establish a symbolic link using ln -s instead. Symbolic links are permitted across filesystem boundaries.

Technical Notes

The symbolic name for this error is EXDEV, errno=18.