Solaris Common Messages and Troubleshooting Guide

"S"

save: SYSTEM error, Arg list too long

Cause

The save fails with this error because the database (index) file for the client is greater than 2 Gbytes. With the Solaris 2.6 release and SBU 5.0.1 this is no longer a problem.

Action

However, with earlier versions of the Solaris software you need to open nwadmin -> indexes -> select appropriate client -> select appropriate fs -> remove oldes cycle -> reclaim space.

You might have to repeat a few times to reclaim enough space. The indexes can be re-created later, if necessary, by using a scanner.

SCSI bus DATA IN phase parity error

Cause

The most common cause of this problem is unapproved hardware. Some SCSI devices for the PC market do not meet the high I/O speed requirements for the UNIX market. Other possible causes of this problem are improper cabling or termination, and power fluctuations. Data corruption is possible, but unlikely to occur, because this parity error prevents data transfer.

Action

Check that all SCSI devices on the bus are Sun-approved hardware. Then verify that all cables measure no longer than six meters total and that all SCSI connections are properly terminated. If power fluctuations are occurring, invest in an uninterruptible power supply.

SCSI transport failed: reason 'reset'

Cause

This message indicates that the system sent data over the SCSI bus, but the data never reached its destination because of a SCSI bus reset. The most common cause of this condition is conflicting SCSI targets. Data corruption is possible, but unlikely to occur, because this failure prevents data transfer.

Action

Verify that all cables measure no longer than six meters total and that all SCSI connections are properly terminated. If power surges are a problem, acquire a surge suppressor or an uninterruptible power supply.

A machine's internal disk drive is usually SCSI target 3. Make sure that external and secondary disk drives are targeted to 1, 2, or 0, and do not conflict with each other. Also, make sure that tape drives are targeted to 4 or 5, and CD drives to 6, avoiding any conflict with each other or with disk drives. If the targeting of the internal disk drive is in question, power off the machine, remove all external drives, turn on the power, and from the PROM monitor run the probe-scsi-all or probe-scsi command.

If SCSI device targeting is acceptable, memory configuration could be the problem. Ensure that high-capacity memory chips (such as 4-Mbyte SIMMs) are in lower banks, while lower-capacity memory chips (such as 1-Mbyte SIMMs) are in the upper banks.

SPARC systems do not always support third-party CD-ROM drives, and can generate a similar unknown vendor error message. Check with the CD-ROM vendor for specific configuration requirements.

Some third-party disk drives have a read-ahead cache that interferes with the Solaris device drivers. Make sure that any existing read-ahead cache facility is turned off.

See Also

For more information on SCSI targets, see the section on device naming conventions in the Solaris Transition Guide. If you are using AnswerBook online documentation, "SCSI targets" is a good search string.

Security exception on host string. USER ACCESS DENIED.

Cause

When trying to create a user with Adminsuite by placing the home directory on a system remote from the NIS+ server, the user gets this error message:


Security exception on host hostname. USER ACCESS DENIED.
The user identity (555)username was received, but that user
is not authorized to execute the requested functionality
on this system. Is this user a member of an appropriate
security group on this system ?
(Function: class directory method create_dir)
The user can use rsh(1) to access the remote machine and create a home directory on the system.

Action

The user was not in the system administration group NIS+ tables.


# niscat group.org_dir | grep sysadmin 
 sysadmin::14:
Add the user name to the system administration group.

Segmentation Fault

Cause

Segmentation faults usually come from a programming error. This message is usually accompanied by a core dump, except on read-only file systems.

Action

To see which program produced a core(4) file, run either the file(1) command or the adb(1) command. The following examples show the output of the file(1) and adb(1) commands on a core file from the dtmail program.


$ file core
core: ELF 32-bit MSB core file SPARC Version 1, from `dtmail'

$ adb core
core file = core -- program `dtmail'
SIGSEGV  11: segmentation violation
^D      (use Control-d to quit the adb rogram)
Ask the vendor or author of this program for a debugged version.

Technical Notes

A process has received a signal indicating that it attempted to access an area of memory that is protected or that does not exist. The two most common causes of segmentation faults are attempting to dereference a null pointer or indexing past the bounds of an array.

sendmail[]: can't lookup data via name server "dns" or sendmail[]: can't lookup data via name server "nis"

Cause

The following entry in the /etc/nsswitch.conf file, sendmailvars: dns nis files, causes the messages to appear in the console window.

Action

The sendmailvars database can be used only with local files and/or NIS+. If you do not have this database setup, the default sendmailvars entry should look as follows in the /etc/nsswitch.conf file:


sendmailvars: files

sendmail[init]: NOQUEUE: SYSERR(root): Cannot bind to domain <domain>: no such map in server's domain: Bad file number

Cause

The user is running NIS and receives this error on several NIS machines.

Action

Check the following:

  1. For the system(s) not working, make sure there is a /var/yp/nicknames file. Also, make sure that this file contains this entry: aliases mail.aliases

  2. On one of the systems not working, execute the following:


    ypcat aliases

    You will probably get this message: no such map in servers domain. Do a ypwhich to see which NIS server the system is bound to. Next, go to that server and verify that the mail.aliases map is missing from /var/yp/domainname. This map must either be created or copied over from one of the NIS servers that contains the map.

sendmail[int]: NOQUEUE: SYSERR: net hang reading from string

Cause

This is a sendmail(1M) message that appears on the console and in the log file /var/adm/messages. If this message occurs once for a particular user, a mail message from this user might end with a partial line (having no terminating newline character). If this message appears frequently or at busy times, especially along with other networking errors, it could indicate network problems.

Action

Check the user's mail spool file to see if a message ends without a newline character. If so, talk with the user and determine how to prevent the problem from occurring again. If these messages are the result of network problems, you could try moving the mail spool directory to another machine with a faster network interface.

Technical Notes

During the SMTP receipt of DATA phase, a message-terminating period on a line of its own never arrived. sendmail(1M) timed out and produced this error.

Service wouldn't let us acquire selection

Cause

This message indicates that the OpenWindows selection service failed to seize the requested selection from /tmp/winselection.

Consider the following diagnostics: the requested selection could be 0 for unknown, 1 for caret, 2 for primary, 3 for secondary, or 4 for clipboard. The result could be 0 for failure, 2 for nonexistent, 3 for did not have, 4 for wrong rank, 5 for continued, 6 for cancelled, or 7 for unrecognized.

setmnt: Cannot open /etc/mnttab for writing

Cause

The system is having problems writing to /etc/mnttab. The file system containing /etc might be mounted read-only, or not mounted at all.

Action

Check that this file exists and is writable by root. If so, ensure that the /etc file system has been mounted, and is mounted read-write, rather than read-only.

share_nfs: /home: Operation not applicable

Cause

This message usually indicates that the system has a local file system mounted on /home, which is where the automounter usually mounts users' home directories.

Action

When a system is running the automounter, do not mount local file systems on the /home directory. Mount them on another directory, such as /disk2, which on most systems you have to create. You could also change the automounter auto_home entry, but that is a more difficult solution.

Signal 8 error

Cause

In this case, the user gets a Signal 8 error during installation--right after starting Openwindows--and installation stops.

Action

Shut down the system "gracefully," and, as it is rebooting, place a ZIP drive cartridge (blank or used) in the ZIP drive. Begin the normal installation of the Solaris IA software. It is not possible to continue the existing installation of the Solaris software by putting a cartridge in the ZIP drive after receiving this error. When the Solaris software checks all of your hardware, it thinks the ZIP drive is just another hard drive and attempts to read from it. If there is no cartridge in the drive, then you receive the signal 8 error. If the Solaris software installation "sees" a cartridge in the ZIP drive, it reads from it, even if there is no data on the cartridge, and then continues.

SIMS license error: licenses invalid

Cause

This is a license internet mail server problem. The user is installing a departmental version of SIMS 3.1 on a Pentium 2 PC that is running the Solaris 2.6 IA release. The system is using a JavaTM interface and keeps getting the above error. The two license files from the license center are:


SERVER server 
DAEMON lic.SUNW /etc/opt/licenses/lic.SUNW 
INCREMENT SLAPD.1 lic.SUNW 1.000 08-Mar-1998 1  

SERVER nwlab4 727a2b6a 7588 
DAEMON suntechd /etc/opt/licenses/suntechd /etc/opt/licenses/daemon_options 
INCREMENT sun.mail.mbox suntechd 3.100 08-Mar-1998 100

Action

Merge the two license files together and delete the extra SERVER line.

Slice c0t1d0s0 is too small to contain 1 replicas

Cause

When trying to add a state replica using metatool to cylinder 0 of a disk, the following error message appears:


	Your attempt to attach metastate database
	replicas on slice "c?t?d?s?" failed for the
	following reason: Slice c?t?d?s? is too small
	to contain 1 replicas.

This is because metatool masks out the very first cylinder to protect the disk label. On disksuite v4.1, metatool does allow adding the databases to cylinder 0 on 2.1Gbyte disks or larger.

Action

As a workaround, start at cylinder 1 (not cylinder 0) or use the command line (metadb -a).

snmpdx: bind() failed on udp on 161 [errno: address already in use] 125 snmpdx dmid: unable to connect to snmpdx

Cause

The user is running the Solaris 2.6 release with a Cisco FDDI card and is receiving the above error.

Action

In the Solaris 2.6 software a startup script is included in /etc/rc3.d that starts snmpdx (which uses port 161). You receive the error message because the FDDI SNMP agent is running, and it has already claimed port 161. Two solutions are:

  1. Move the snmpdx start-up script


    mv /etc/rc3.d/S76snmpdx    /etc/rc3.d/s76snmpdx
    so that snmpdx does not start.

  2. Check if the FDDI can use a different port, other than 161.

Socket type not supported

Cause

The support for the socket type has not been configured into the system or no implementation for it exists.

Technical Notes

The symbolic name for this error is ESOCKTNOSUPPORT, errno=121.

Soft error rate (int%) during writing was too high

Cause

This message from the SCSI tape drive appears when Exabyte or DAT tapes generate too many soft (recoverable) errors. It is followed by the advisory Please, replace tape cartridge message. Soft errors are an indication that hard errors could soon occur, causing data corruption.

Action

First, clean the tape head with a cleaning tape, as recommended by the manufacturer. If that remedy does not work, replace the tape cartridge. If the problem persists, you might need to replace the tape drive with new tape cartridges.

Software caused connection abort

Cause

A connection abort occurred internally to your host machine.

Technical Notes

The symbolic name for this error is ECONNABORTED, errno=130.

Srmount error

Cause

This error is RFS specific. It occurs when an attempt is made to stop RFS while resources are still mounted by remote machines, or when a resource is readvertised with a client list that does not include a remote machine with the resource currently mounted.

Technical Notes

The symbolic name for this error is ESRMNT, errno=69.

Stale NFS file handle

Cause

A file or directory that was opened by an NFS client was either removed or replaced on the server.

Action

If you were editing this file, write it to a local file system instead. Try remounting the file system on top of itself or shutting down any client processes that refer to stale file handles. If neither of these solutions works, reboot the system.

Technical Notes

The original vnode is no longer valid. The only way to remove this error is to force the NFS server and client to renegotiate file handles.

The symbolic name for this error is ESTALE, errno=151.

start up failure no such file or directory

Refer to "late initialization error".

statd: cannot talk to statd at string

Cause

This message comes from the NFS status monitor daemon statd(1M), which provides crash recovery services for the NFS lock daemon lockd(1M). The message indicates that statd(1M) has left old references in the /var/statmon/sm and /var/statmon/sm.bak directories. After a user has removed or modified a host in the hosts database, statd(1M) might not properly purge files in these directories, which results in its trying to communicate with a nonexistent host.

Action

Remove the file named variable (where variable is the host name) from both the /var/statmon/sm and /var/statmon/sm.bak directories. Then kill the statd(1M) daemon and restart it. If that does not get rid of the message, kill and restart lockd(1M) as well. If that remedy does not work, reboot the machine at your convenience.

stty: TCGETS: Operation not supported on socket

Cause

This message occurs when a user tries to use remote copy with rcp(1) or remote shell with rsh(1) from one machine to another, but has an stty(1) command in the remote .cshrc file. This error creates failure for the rcp(1) or rsh(1) command.

Action

The solution is to move the invocation of the stty(1) command to the user's .login (or equivalent) file. Alternatively, execute the stty(1) command in .cshrc only when the shell is interactive. You could perform the following test:


if ($?prompt) stty ...

Technical Notes

The rcp(1) and rsh(1) commands make a connection using sockets, which do not support stty(1)'s TCGETS ioctl.

su: No shell

Cause

This message indicates that someone changed the default login shell for root to a program that is missing from the system. For example, the final colon-separated field in /etc/passwd could have been changed from /sbin/sh to /usr/bin/bash, which does not exist in that location. Possibly an extra space was appended at the end of the line. The outcome is that you cannot login as root or switch user to root, and, thus, cannot directly fix this problem.

Action

The only solution is to reboot the system from another source, then edit the password file to correct this problem. Invoke sync(1M) several times, then halt the machine by typing Stop-A or by pressing the reset button. Reboot as single-user from CD-ROM, the net, or diskette, such as by typing boot cdrom -s at the prompt.

After the system starts and gives you a # prompt, mount the device corresponding to the original root partition somewhere, such as with a mount(1M) command similar to the one that follows. Then run an editor on the newly mounted system password file (use ed(1) if terminal support is lacking):


# mount /dev/dsk/c0t3d0s0 /mnt
# ed /mnt/etc/passwd
Use the editor to change the password file's root entry to call an existing shell, such as /usr/bin/csh or /usr/bin/ksh.

Technical Notes

To keep the No shell problem from happening, habitually use admintool or /usr/ucb/vipw to edit the password file. These tools make it difficult to change password entries in ways that make the system unusable.

su: 'su root' failed for login on /dev/pts/int

Cause

The user specified by login tried to become superuser, but typed the wrong password.

Action

If the user is supposed to know the root password, wait to see if the correct password is supplied. If the user is not supposed to know the root password, ask why he or she is attempting to become superuser.

su: 'su root' succeeded for login on /dev/pts/int

Cause

The user specified by login just became superuser by typing the root password.

Action

If the user is supposed to know the root password, this message is only informational. If the user is not supposed to know the root password, change this password immediately and ask how the user learned it.

SunPC may NOT run correctly as root

Cause

With SunPC 4.1 and the 102924 jumbo patch installed, a user (who is not root) attempts to run SunPC and receives the following error message:


SunPC may NOT run correctly as root.
Please run in user mode.
SunPC script is exiting

The user's primary group ID is probably root. For example:


$ /usr/bin/id
uid=33650(gruff) gid=0(root)

Action

Change the user's primary group to another group, such as 10, and, because the user still needs to be in the root group, add the root group to the user's secondary group list.

syncing file systems...

Cause

This message indicates that the kernel is updating the super-blocks before taking the system down to ensure file system integrity. This message appears after a halt(1M) or reboot(1M) command. It can also appear after a system panic, in which case the system might contain corrupted data.

Action

If you just halted or rebooted the machine, take no action. This message is normal. In case of a system panic, look up the panic messages. Your system vendor might be able to help diagnose the problem. So that you can describe the panic to the vendor, either leave your system in its panicked state or be sure that you can reproduce the problem.

Technical Notes

Numbers that sometimes display after the three dots in the message show the count of dirty pages that are being written out. Numbers in brackets show an estimate of the number of busy buffers in the system.

syslog service starting.

Cause

During system reboot, this message might appear and the boot seemingly hangs. After starting syslogd(1M) service, the system runs /etc/rc2.d/S75cron, which in turn calls ps(1). Sometimes after an abrupt system crash /dev/bd.off becomes a link to nowhere, causing the ps(1) command to hang indefinitely.

Action

Reboot as a single user (for example with boot -s) and run ls -l /dev/bd* to see if this is the problem. If so, remove /dev/bd.off, then run bdconfig off or reboot with the -r (reconfigure) option.

This is the most commonly reported situation that causes ps(1) to hang.

System booting after fatal error FATAL

Cause

The system reboots automatically. Afterward, the messages file contains System booting after fatal error FATAL.

The message is issued during a reboot after the system detects a hardware error. The following can cause this response: UPA address parity error, Master queue overflows, DTAG parity errors, E-Cache tag parity errors, and Coherence errors.

Action

Use prtdiag(1M) to help identify failed hardware components. The errors indicate that you either have a bad CPU module or a bad system board.

SYSTEM error, Arg list too long

Cause

When trying to back up a client with networker, the following error occurs:


* heaven.com:/export/heaven2 save: SYSTEM error, Arg list too long 
* heaven.com:/export/heaven2 save: Cannot open save session with heaven.com 
* heaven.com:/export/heaven3 1 retry attempted 
* heaven.com:/export/heaven3 save: SYSTEM error, Arg list too long 
* heaven.com:/export/heaven3 save: Cannot open save session with heaven.com

Action

An error like this is due to an index file (/nsr/index/clientname) that is greater than 2 Gbytes in Solstice backup revisions less than 5.0.1. In 5.0.1 the indexes are segmented so this error should no longer be a problem. In any revision of Solstice backup this error can also be due to a corrupt client index. If so, running the following command might resolve the problem:


# nsrck -F clientname
If this remedy does not fix the problem, shut down the networker daemons, remove the client index, and restart the daemons. The backup should then run fine.

system hang

Cause

4.1.3C Sbus cards suffered a system freeze.

SYSTEM HANGS DURING BOOT

Cause

When the user boots a system, it hangs after the following boot messages: root on, swap on, and dump on. After the system displays these messages, the LEDs flash and the system hangs.

This is due to an earlier fsck that deleted devices under the /dev directory. Check for the /dev/console device and, if it is missing, create one.

system will not connect to port 80

Refer to "late initialization error".