NIS Problems Affecting Many Clients (Solaris Naming Administration Guide)

Solaris Naming Administration Guide

NIS Problems Affecting Many Clients

If only one or two clients are experiencing symptoms that indicate NIS binding difficulty, the problems probably are on those clients (see "NIS Problems Affecting One Client"). If many NIS clients are failing to bind properly, the problem probably exists on one or more of the NIS servers.

Network or Servers are Overloaded

NIS can hang if the network or NIS servers are so overloaded that ypserv cannot get a response back to the client ypbind process within the time-out period.

Under these circumstances, every client on the network experiences the same or similar problems. In most cases, the condition is temporary. The messages usually go away when the NIS server reboots and restarts ypserv, or when the load on the NIS servers or network itself decreases.

Server Malfunction

Make sure the servers are up and running. If you are not physically near the servers, use the ping command.

NIS Daemons Not Running

If the servers are up and running, try to find a client machine behaving normally, and run the ypwhich command. If ypwhich does not respond, kill it. Then log in as root on the NIS server and check if the NIS ypbind process is running by entering:

# ps -e | grep yp

Note -

Do not use the -f option with ps because this option attempts to translate user IDs to names which causes more name service lookups that may not succeed.

If either the ypbind or ypserv daemons are not running, kill them and then restart them by entering:

# /usr/lib/netsvc/yp/ypstop
# /usr/lib/netsvc/yp/ypstart

If both the ypserv and ypbind processes are running on the NIS server, type:

# ypwhich

If ypwhich does not respond, ypserv has probably hung and should be restarted. While logged in as root on the server, kill ypserv and restart it by typing:

# /usr/lib/netsvc/yp/ypstop
# /usr/lib/netsvc/yp/ypstart

Servers Have Different Versions of an NIS Map

Because NIS propagates maps among servers, occasionally you may find different versions of the same map on various NIS servers on the network. This version discrepancy is normal add acceptable if the differences do not last for more than a short time.

The most common cause of map discrepancy is that something is preventing normal map propagation. For example, an NIS server or router between NIS servers is down. When all NIS servers and the routers between them are running, ypxfr should succeed.

If the servers and routers are functioning properly, check the following:

Log ypxfr output (see "Logging ypxfr Output").
Check the control files (see "Check the crontab File and ypxfr Shell Script").
Check the ypservers map on the master (see "Check the ypservers Map").

Logging `ypxfr` Output

If a particular slave server has problems updating maps, log in to that server and run ypxfr interactively. If ypxfr fails, it tells you why it failed, and you can fix the problem. If ypxfr succeeds, but you suspect it has occasionally failed, create a log file to enable logging of messages. To create a log file, enter:

ypslave# cd /var/yp
ypslave# touch ypxfr.log

This creates a ypxfr.log file that saves all output from ypxfr.

The output resembles the output ypxfr displays when run interactively, but each line in the log file is time stamped. (You may see unusual ordering in the time-stamps. That is okay--the time-stamp tells you when ypxfr started to run. If copies of ypxfr ran simultaneously but their work took differing amounts of time, they may actually write their summary status line to the log files in an order different from that which they were invoked.) Any pattern of intermittent failure shows up in the log.

Note -

When you have fixed the problem, turn off logging by removing the log file. If you forget to remove it, it continues to grow without limit.

Check the `crontab` File and `ypxfr` Shell Script

Inspect the root crontab file, and check the ypxfr shell script it invokes. Typographical errors in these files can cause propagation problems. Failures to refer to a shell script within the /var/spool/cron/crontabs/root file, or failures to refer to a map within any shell script can also cause errors.

Check the `ypservers` Map

Also, make sure that the NIS slave server is listed in the ypservers map on the master server for the domain. If it is not, the slave server still operates perfectly as a server, but yppush does not propagate map changes to the slave server.

Work Around

If the NIS slave server problem is not obvious, you can work around it while you debug using rcp or ftp to copy a recent version of the inconsistent map from any healthy NIS server. For instance, here is how you might transfer the problem map:

 ypslave# rcp ypmaster:/var/yp/mydomain/map.\* /var/yp/mydomain

Here the * character has been escaped in the command line, so that it will be expanded on ypmaster, instead of locally on ypslave.

`ypserv` Crashes

When the ypserv process crashes almost immediately, and does not stay up even with repeated activations, the debug process is virtually identical to that described in "ypbind Crashes". Check for the existence of the rpcbind daemon as follows:

ypserver% ps -e | grep rpcbind

Reboot the server if you do not find the daemon. Otherwise, if the daemon is running, type the following and look for similar output:

% rpcinfo -p ypserver
program 	vers 	proto 	port 	service
100000	4	tcp	111	portmapper
100000	3	tcp	111	portmapper
100068	2	udp	32813	cmsd
...
100007	1	tcp	34900	ypbind
100004	2	udp	731	ypserv
100004	1	udp	731	ypserv
100004	1	tcp	732	ypserv
100004	2	tcp	32772	ypserv

Your machine may have different port numbers. The four entries representing the ypserv process are:

100004 	2 	udp 	731 	ypserv
100004 	1 	udp 	731 	ypserv
100004 	1 	tcp 	732 	ypserv
100004 	2 	tcp 	32772 	ypserv

If they are not present, and ypserv is unable to register its services with rpcbind, reboot the machine. If they are present, deregister the service from rpcbind before restarting ypserv. To deregister the service from rpcbind, on the server type:

# rpcinfo -d number 1
# rpcinfo -d number 2

Where number is the ID number reported by rpcinfo (100004, in the example above).