Solaris Common Messages and Troubleshooting Guide

"P"

Package not installed

Cause

This error occurs when a user attempts to use a system call from a package that has not been installed.

Technical Notes

The symbolic name for this error is ENOPKG, errno=65.

page_create: invalid flag

Cause

This error occurs after a vxvm upgrade. In this case, the user had the drivers (vxio and vxspec) for the Solaris 2.5.1 software and not for the Solaris 2.6 software. This condition was verified by using ls -l /kernel/drv/*vx*.

Action

Execute a pkgrm or re-install VXVM 2.4 and re-encapsulate the root.

Panic

Cause

A system panics and crashes when a program exercises an operating system bug. Although the crash might seem unfriendly to a user, the sudden stop actually safeguards the system and its data from further corruption.

In addition to stopping the operating system, the panic routine copies the memory contents in use to a dump device, recording critical information about the current state of the CPU from which the panic routine was called.

Because the primary swap device is usually the default dump device, the primary swap device should be large enough to hold a complete image of memory. The system tries to reboot after the memory image is saved.

If the system does not reboot successfully, consider these possibilities:

  1. Catastrophic hardware failure, such as faulty memory or a crashed disk

  2. Major kernel configuration faults, such as an unstable device driver

  3. Major kernel-tuning errors, such as a too-large value for MAXUSERS

  4. Data corruption, including corruption of the operating system files

  5. Manual intervention needed, as when fsck(1M) expects answers to its queries

Action

To find out why a system crashed, you can look in the /var/adm/message* log files.

Of these methods, using savecore(1M) is the most informative. The savecore(1M) command transfers the system crash dump image generated by the panic routine from the dump device to a file system. The image can then be analyzed with a debugger, such as adb(1).

See Also

Correctly setting up savecore(1M) and interpreting its results can be difficult. For more information about debugging system panics, refer to Panic! UNIX System Crash Dump Analysis by Chris Drake and Kimberley Brown (ISBN 0-13-149386-8).

panic -boot: Could not mount filesystem

Cause

The first problem comes from the following jumpstart error:


2ec00 RPC: Can't decode result.
whoami RPC call failed with rpc status: 2
panic - boot: Could not mount filesystem.
program terminated
ok
Normally, this error occurs when the boot process is unable to get to the install image.

Additionally, other users have the same error message, with an additional message:


'Timeout waiting for ARP/RARP packet...'

Action

To solve the first problem:

  1. Check how the dfstab(4) (/etc/dfs/dfstab on the install image NFS server) looks:


    share -F nfs -o ro,anon=o /jumpstart-dir

  2. Run share(1M) command on the installed image NFS server, to make sure it is shared properly.

  3. Check /etc/bootparams file on the net install server. Look for entries with incorrect boot path.

  4. Make sure that /usr/sbin/rpc.bootparamd is running on the boot server. If necessary, kill and restart it.

  5. Check /etc/ethers on the boot server for duplicate or conflicting entries.

  6. At the prompt, run test net /test-net and/or watch net /watch-net to test the network connectivity.

As a workaround for the second problem, check the nsswitch.conf(4) file. If some of the entries point to NIS, such as:


rpc		nis	files
hosts		nis	files
ethers		nis	files
bootparams	files   nis
change all of these entries to files first:

rpc		files 	nis
hosts		files 	nis
ethers		files	nis
bootparams	files	nis


Note -

You might have to update these files manually if they do not contain information on the client machine you are trying to jumpstart.


Then, remove the client with rm_install_client(1M), remove the contents of tftpboot, and again add the client:


add_install_client -c /jumpstart-dir/profiles  'client name'  'arch'

Panic on cpu 0: valloc'd past tmpptes

Cause

The machine is an SS20 with 256 Mbytes of RAM, an FDDI interface, and a single CPU. It is running Online Disksuite for mirroring and striping. The following recommended kernel patches were installed:


102517-03 
102436-02 
102394-02 
102516-06
After their installation, the machine was rebuilt to allow for the new patches to be implemented. However, the machine panicked immediately after loading the kernel with this error message.

Action

The kernel was rebuilt with a new MAXUSERS value of 96, and this kernel enabled the machine to boot properly.

Technical Notes

Information directly related to this situation was not available; however, there was a description of another type of panic that was related to seg_u. In that description, the MAXUSERS value was set too large, causing the kernel to overrun table space. Furthermore, the value of MAXUSERS varies among the different architectures and the different revisions of the OS and is directly related to the amount of physical RAM in the system in an inverse proportion. Further investigation revealed that the value of MAXUSERS was set to 128. Based on the related information, it seems that the panic was due to valloc attempting to define memory space in excess of the value of tmpptes.

PARTIALLY ALLOCATED INODE I=int CLEAR?

Cause

Probably the system crashed in the middle of a sync(2) or write(2) operation, and during phase 1, fsck(1M) found that the specified inode was neither allocated nor unallocated.

Action

If any directory entries point to this inode and you answer "yes" to this question, phase 2 might get UNALLOCATED messages. Carefully exit fsck(1M) and run ncheck(1M)--specifying the inode number after the -i option--to determine which file or directory is involved. You might be able to restore this file or directory from another system. fsck(1M) also might copy this file to the lost+found directory in a later phase.

See Also

For more information, see the chapter on checking file system integrity in the System Administration Guide, Volume 1.

passwd: Changing password for string

Cause

The following lines are put into /etc/nsswitch.conf:


passwd:     compat
passwd_compat:     nis  
Then, when passwd is run, it fails as follows:

server1% passwd
passwd:  Changing password for khh
server1%


Note -

passwd exits before a password is entered.


Action

In the man page for passwd, you see the following:

If all requirements are met, by default, the passwd(1) command consults /etc/nsswitch.conf to determine which repositories need a password update. It searches the passwd(4) and passwd_compat entries. The sources (repositories) associated with these entries are updated. However, the supported password update configurations are limited to the following five cases. Failure to comply with the configurations prevents users from logging in to the system.


passwd: files
passwd: files nis
passwd: files nisplus
passwd: compat (==> files nis)
passwd: compat (==> files nisplus)
passwd_compat: nisplus


Note -

The passwd(1) man page does NOT say that you can use the line: passwd_compat: nis. passwd(1) works exactly as described in the man page.


passwd (SYSTEM): System error: repository out of range

Cause

When trying to lock a user account and using nispasswd with the -l option in the Solaris 2.6 release, you get this error: passwd (SYSTEM): System error: repository out of range.

Action

Use passwd -r nisplus -l username instead.

passwd.org_dir: NIS+ servers unreachable

Cause

This is the first of three messages that an NIS+ client prints when it cannot locate an NIS+ server on the network.

Action

For details, refer to "hosts.org_dir: NIS+ servers unreachable".

Password does not decrypt secret key for unix.uid@string

Cause

This message appears at login when a user's password is not identical to the user's keylogin(1) network password. When a system is running NIS+, the login program first performs UNIX authentication, and then attempts a keylogin(1) for secure RPC authentication.

Action

To gain credentials for secure RPC, users can run keylogin(1) (after login) and type their secret key. To stop this message from appearing at time of login, users can run the chkey -p command and set their network password to be the same as their NIS+ password. If a user does not remember the network password, the system administrator should delete and re-create the user's credentials table entry so the user can establish a new network password with chkey(1).

password file busy - try again later.

Cause

On a SunOS system running NIS (YP), the user runs yppasswdd(1M)and the system reports this error. On the NIS Master server, this error is in the messages file from rpc.yppasswdd: password file busy - try again. This error is caused superficially by the existence of a lock file, /var/yp/passwd.ptmp. Removing this file allows yppasswdd to run to completion, but subsequent invocations still fail with the same error message. The root cause is that yppasswdd has the-m option, which says to run make to push the maps out to the slave servers. In this situation, a problem occurred in pushing the maps to a slave server; the push would hang. Thus, the push was never completed, and the lock file was never removed. This was tested by doing the following:


#cd /var/yp 
#make passwd 
passwd is up to date 
#touch passwd 
#make passwd
From here, the make remakes the map, but then hangs on the push to the slave.

Action

To fix the root cause, find out why the map does not push. In this situation, it was a routing issue; however, the remedy could lie elsewhere.

pdbadmin start node fails cluster_establish join not allowed

Cause

The user created a disk group, but forgot to make it shared. After it was made a shared disk group, the user attempted to start the second node (which had not been rebooted). pdbadmin start node on second pdb node failed with this repeated message until it finally timed out:


return from cluster_establish is join not allowed now  
retrying cluster_establish

Action

You can either reboot the second node or run vxdctl enable.

pdbadmin start node now works.

Permission denied

Cause

An attempt was made to access a file in a way forbidden by the protection system.

Action

Check the ownership and protection mode of the file (with a long listing from the ls -l command) to see who is allowed access to the file. Then change the file or directory permissions, as needed.

Technical Notes

The symbolic name for this error is EACCES, errno=13.

Please specify a recipient.

Cause

With mailtool(1), this message comes up in a dialog box whenever a user tries to deliver a message with no address in the To: field.

Action

For details, refer to "Recipient names must be specified".

Protocol error

Cause

A protocol error occurred. This error is device specific, but is generally not related to a hardware failure.

Technical Notes

The symbolic name for this error is EPROTO, errno=71.

protocol error, string closed connection

Cause

rlogin(1) fails on a machine with the SunOS system installed.

Action

  1. Check the permissions in in.rlogind on the machine you are trying to connect to. The permissions should look like this:


    -rwxr-xr-x  1 root     staff       16384 Jan 20  1994 /usr/sbin/in.rlogind

  2. Check the login line in the /etc/inetd.conf file. It should look like the following:


    login	stream	tcp	nowait	root	/usr/sbin/in.rlogind	in.rlogind

  3. Check /etc/passwd to see if an invalid login shell has been substituted in the entry for the login ID.

Protocol family not supported

Cause

The protocol family has not been configured into the system or no implementation for it exists. This is used for the Internet protocols.

Technical Notes

The symbolic name for this error is EPFNOSUPPORT, errno=123.

Protocol not supported

Cause

The requested networking protocol has not been configured into the system, or no implementation for it exists. (A protocol is a formal description of the messages to be exchanged and the rules to be followed when systems exchange information.)

Action

Verify that the protocol is in the /etc/inet/protocols file and in the NIS protocols map, if applicable. If the protocol is not listed, and you want to permit its use, configure the protocol as documented or as required.

Technical Notes

The symbolic name for this error is EPROTONOSUPPORT, errno=120.

Protocol wrong type for socket

Cause

This message indicates either an application programming error, or badly configured protocols.

Action

Make sure that the /etc/protocols file corresponds number-for-number with the NIS protocols(4) map. If it does, ask the vendor or author of the application for an update.

Technical Notes

A protocol was specified that does not support the semantics of the socket type requested. This protocol amounts to a request for an unsupported type of socket. Look at the source code that made this socket request and check that it requested one of the types specified in /usr/include/sys/socket.h.

The symbolic name for this error is EPROTOTYPE, errno=98.