Solaris Naming Administration Guide

Appendix A Problems and Solutions

This appendix describes some of the problems you may encounter while administering Solaris operating environment namespaces and how to correct them.

Troubleshooting NIS+

In this appendix, problems are grouped according to type. For each problem there is a list of common symptoms, a description of the problem, and one or more suggested solutions.

In addition to this appendix, there is an appendix containing an alphabetic listing of the more common NIS+ error messages. If you are responding to a specific error message, check Appendix B, Error Messages, first. If the problem is simple, or specific to a single error message, its solution is usually described in Appendix B, Error Messages.

NIS+ Debugging Options

The NIS_OPTIONS environment variable can be set to control various NIS+ debugging options.

Options are specified after the NIS_OPTIONS command separated by spaces with the option set enclosed in double quotes. Each option has the format name=value. Values can be integers, character strings, or filenames depending on the particular option. If a value is not specified for an integer value option, the value defaults to 1.

NIS_OPTIONS recognizes the following options:

Table A-1 NIS_OPTIONS Options and Values


Option	Values	Actions
`debug_file`	`filename`	Directs debug output to specified file. If this option is not specified, debug output goes to `stdout`.
`debug_bind`	`Number`	Displays information about the server selection process.
`debug_rpc`	`1` or `2`	If the value is 1, displays RPC calls made to the NIS+ server and the RPC result code. If the value is 2, displays both the RPC calls and the contents of the RPC and arguments and results.
`debug_calls`	`Number`	Displays calls to the NIS+ API and the results that are returned to the application.
`pref_srvr`	`String`	Specifies preferred servers in the same format as that generated by the `nisprefadm` command (see Table 15-1). This will over-ride the preferred server list specified in `nis_cachemgr`.
`server`	`String`	Bind to a particular server.
`pref_type`	`String`	Not currently implemented.

For example, (assuming that you are using a C-Shell):

To display many debugging messages you would enter:

setenv NIS_OPTIONS "debug_calls=2 debug_bind debug_rpc"

To obtain a simple list of API calls and store them in the file /tmp/CALLS you would enter:

setenv NIS_OPTIONS "debug_calls debug_file=/tmp/CALLS"

To obtain a simple list of API calls sent to a particular server you would enter:

setenv NIS_OPTIONS "debug_calls server=sirius"

NIS+ Administration Problems

This section describes problems that may be encountered in the course of routine NIS+ namespace administration work. Common symptoms include:

"Illegal object type" for operation message.
Other "object problem" error messages
Initialization failure
Checkpoint failures
Difficulty adding a user to a group
Logs too large/lack of disk space/difficulty truncating logs
Cannot delete groups_dir or org_dir

Illegal Object Problems

Symptoms

"Illegal object type" for operation message
Other "object problem" error messages

There are a number of possible causes for this error message:

You have attempted to create a table without any searchable columns.
A database operation has returned the status of DB_BADOBJECT (see the nis_db man page for information on the db error codes).
You are trying to add or modify a database object with a length of zero.
You attempted to add an object without an owner.
The operation expected a directory object, and the object you named was not a directory object.
You attempted to link a directory to a LINK object.
You attempted to link a table entry.
An object that was not a group object was passed to the nisgrpadm command.
An operation on a group object was expected, but the type of object specified was not a group object.
An operation on a table object was expected, but the object specified was not a table object.

`nisinit` Fails

Make sure that:

You can ping the NIS+ server to check that it is up and running as a machine.
The NIS+ server that you specified with the -H option is a valid server and that it is running the NIS+ software.
rpc.nisd is running on the server.
The nobody class has read permission for this domain.
The netmask is properly set up on this machine.

Checkpoint Keeps Failing

If checkpoint operations with a nisping -C command consistently fail, make sure you have sufficient swap and disk space. Check for error messages in syslog. Check for core files filling up space.

Cannot Add User to a Group

A user must first be an NIS+ principal client with a LOCAL credential in the domain's cred table before the user can be added as a member of a group in that domain. A DES credential alone is not sufficient.

Logs Grow too Large

Failure to regularly checkpoint your system with nisping -C causes your log files to grow too large. Logs are not cleared on a master until all replicas for that master are updated. If a replica is down or otherwise out of service or unreachable, the master's logs for that replica cannot be cleared. Thus, if a replica is going to be down or out of service for a period of time, you should remove it as a replica from the master as described in "Removing a Directory". Keep in mind that you must first remove the directory's org_dir and groups_dir subdirectories before you remove the directory itself.

Lack of Disk Space

Lack of sufficient disk space will cause a variety of different error messages. (See "Insufficient Disk Space" for additional information.)

Cannot Truncate Transaction Log File

First, check to make sure that the file in question exists and is readable and that you have permission to write to it.

You can use ls /var/nis/trans.log to display the transaction log.
You can use nisls -l and niscat to check for existence, permissions, and readability.
You can use syslog to check for relevant messages.

The most likely cause of inability to truncate an existing log file for which you have the proper permissions is lack of disk space. (The checkpoint process first creates a duplicate temporary file of the log before truncating the log and then removing the temporary file. If there is not enough disk space for the temporary file, the checkpoint process cannot proceed.) Check your available disk space and free up additional space if necessary.

Domain Name Confusion

Domain names play a key role in many NIS+ commands and operations. To avoid confusion, you must remember that, except for root servers, all NIS+ masters and replicas are clients of the domain above the domain that they serve. If you make the mistake of treating a server or replica as if it were a client of the domain that it serves, you may get Generic system error or Possible loop detected in namespace directoryname:domainname error messages.

For example, the machine altair might be a client of the subdoc.doc.com. domain. If the master server of the subdoc.doc.com. subdomain is the machine sirius, then sirius is a client of the doc.com. domain. When using, specifying, or changing domains, remember these rules to avoid confusion:

Client machines belong to a given domain or subdomain.
Servers and replicas that serve a given subdomain are clients of the domain above the domain they are serving.
The only exception to Rule 2 is that the root master server and root replica servers are clients of the same domain that they serve. In other words, the root master and root replicas are all clients of the root domain.

Thus, in the example above, the fully qualified name of the altair machine is alladin.subdoc.doc.com. The fully qualified name of the sirius machine is sirius.doc.com. The name sirius.subdoc.doc.com. is wrong and will cause an error because sirius is a client of doc.com., not subdoc.doc.com.

Cannot Delete `org_dir` or `groups_dir`

Always delete org_dir and groups_dir before deleting their parent directory. If you use nisrmdir to delete the domain before deleting the domain's groups_dir and org_dir, you will not be able to delete either of those two subdirectories.

Removal or Disassociation of NIS+ Directory from Replica Fails

When removing or disassociating a directory from a replica server you must first remove the directory's org_dir and groups_dir subdirectories before removing the directory itself. After each subdirectory is removed, you must run nisping on the parent directory of the directory you intend to remove. (See "Removing a Directory".)

If you fail to perform the nisping operation, the directory will not be completely removed or disassociated.

If this occurs, you need to perform the following steps to correct the problem:

Remove /var/nis/rep/org_dir on the replica.
Make sure that org_dir.domain does not appear in /var/nis/rep/serving_list on the replica.
Perform a nisping on domain.
From the master server, run nisrmdir -f replica_directory.

If the replica server you are trying to dissociate is down or out of communication, the nisrmdir -s command will return a Cannot remove replica name: attempt to remove a non-empty table error message.

In such cases, you can run nisrmdir -f -s replicaname on the master to force the dissociation. Note, however, that if you use nisrmdir -f -s to dissociate an out-of-communication replica, you must run nisrmdir -f -s again as soon as the replica is back on line in order to clean up the replica's /var/nis file system. If you fail to rerun nisrmdir -f -s replicaname when the replica is back in service, the old out-of-date information left on the replica could cause problems.

NIS+ Database Problems

This section covers problems related to the namespace database and tables. Common symptoms include error messages with operative clauses such as:

Abort_transaction: Internal database error
Abort_transaction: Internal Error, log entry corrupt
Callback: select failed
CALLBACK_SVC: bad argument

as well as when rpc.nisd fails.

Multiple `rpc.nisd` Parent Processes

Symptoms:

Various Database and transaction log corruption error messages containing the terms:

Log corrupted
Log entry corrupt
Corrupt database
Database corrupted

Possible Causes:

You have multiple independent rpc.nisd daemons running. In normal operation, rpc.nisd can spawn other child rpc.nisd daemons. This causes no problem. However, if two parent rpc.nisd daemons are running at the same time on the same machine, they will overwrite each other's data and corrupt logs and databases. (Normally, this could only occur if someone started running rpc.nisd by hand.)

Diagnosis:

Run ps -ef | grep rpc.nisd. Make sure that you have no more than one parent rpc.nisd process.

Solution:

If you have more than one "parent" rpc.nisd entries, you must kill all but one of them. Use kill -9 process-id, then run the ps command again to make sure it has died.

Note -

If you started rpc.nisd with the -B option, you must also kill the rpc.nisd_resolv daemon.

If an NIS+ database is corrupt, you will also have to restore it from your most recent backup that contains an uncorrupted version of the database. You can then use the logs to update changes made to your namespace since the backup was recorded. However, if your logs are also corrupted, you will have to recreate by hand any namespace modifications made since the backup was taken.

`rpc.nisd` Fails

If an NIS+ table is too large, rpc.nisd may fail.

Diagnosis:

Use nisls to check your NIS+ table sizes. Tables larger than 7k may cause rpc.nisd to fail.

Solution:

Reduce the size of large NIS+ tables. Keep in mind that as a naming service NIS+ is designed to store references to objects, not the objects themselves.

NIS+ and NIS Compatibility Problems

This section describes problems having to do with NIS compatibility with NIS+ and earlier systems and the switch configuration file. Common symptoms include:

The nsswitch.conf file fails to perform correctly.
Error messages with operative clauses.

Error messages with operative clauses include:

Unknown user
Permission denied
Invalid principal name

User Cannot Log In After Password Change

Symptoms:

New users, or users who recently changed their password are unable to log in at all, or able to log in on one or more machines but not on others. The user may see error messages with operative clauses such as:

Unknown userusername"
Permission denied
Invalid principal name

First Possible Cause:

Password was changed on NIS machine.

If a user or system administrator uses the passwd command to change a password on a Solaris operating environment machine running NIS in a domain served by NIS+ namespace servers, the user's password is changed only in that machine's /etc/passwd file. If the user then goes to some other machine on the network, the user's new password will not be recognized by that machine. The user will have to use the old password stored in the NIS+ passwd table.

Diagnosis:

Check to see if the user's old password is still valid on another NIS+ machine.

Solution:

Use passwd on a machine running NIS+ to change the user's password.

Second Possible Cause:

Password changes take time to propagate through the domain.

Diagnosis:

Namespace changes take a measurable amount of time to propagate through a domain and an entire system. This time might be as short as a few seconds or as long as many minutes, depending on the size of your domain and the number of replica servers.

Solution:

You can simply wait the normal amount of time for a change to propagate through your domain(s). Or you can use the nisping org_dir command to resynchronize your system.

`nsswitch.conf` File Fails to Perform Correctly

A modified (or newly installed) nsswitch.conf file fails to work properly.

Symptoms:

You install a new nsswitch.conf file or make changes to the existing file, but your system does not implement the changes.

Possible Cause:

Each time an nsswitch.conf file is installed or changed, you must reboot the machine for your changes to take effect. This is because nscd caches the nsswitch.conf file.

Solution:

Check your nsswitch.conf file against the information contained in the nsswitch.conf man page. Correct the file if necessary, and then reboot the machine.

NIS+ Object Not Found Problems

This section describes problem in which NIS+ was unable to find some object or principal. Common symptoms include:

Error messages with operative clauses such as:

Not found
Not exist
Can't find suitable transport forname"
Cannot find
Unable to find
Unable to stat

Syntax or Spelling Error

The most likely cause of some NIS+ object not being found is that you mistyped or misspelled its name. Check the syntax and make sure that you are using the correct name.

Incorrect Path

A likely cause of an "object not found" problem is specifying an incorrect path. Make sure that the path you specified is correct. Also make sure that the NIS_PATH environment variable is set correctly.

Domain Levels Not Correctly Specified

Remember that all servers are clients of the domain above them, not the domain they serve. There are two exceptions to this rule:

The root masters and root replicas are clients of the root domain.
NIS+ domain names end with a period. When using a fully qualified name you must end the domain name with a period. If you do not end the domain name with a period, NIS+ assumes it is a partially qualified name. However, the domain name of a machine should not end with a dot in the /etc/defaultdomain file. If you add a dot to a machine's domain name in the /etc/defaultdomain file, you will get Could not bind to server serving domain name error messages and encounter difficulty in connecting to the net on boot up.

Object Does Not Exist

The NIS+ object may not have been found because it does not exist, either because it has been erased or not yet created. Use nisls -lin the appropriate domain to check that the object exists.

Lagging or Out-of-Sync Replica

When you create or modify an NIS+ object, there is a time lag between the completion of your action and the arrival of the new updated information at a given replica. In ordinary operation, namespace information may be queried from a master or any of its replicas. A client automatically distributes queries among the various servers (master and replicas) to balance system load. This means that at any given moment you do not know which machine is supplying you with namespace information. If a command relating to a newly created or modified object is sent to a replica that has not yet received the updated information from the master, you will get an "object not found" type of error or the old out-of-date information. Similarly, a general command such as nisls may not list a newly created object if the system sends the nisls query to a replica that has not yet been updated.

You can use nisping to resync a lagging or out of sync replica server.

Alternatively, you can use the -M option with most NIS+ commands to specify that the command must obtain namespace information from the domain's master server. In this way you can be sure that you are obtaining and using the most up-to-date information. (However, you should use the -M option only when necessary because a main point of having and using replicas to serve the namespace is to distribute the load and thus increase network efficiency.)

Files Missing or Corrupt

One or more of the files in /var/nis/data directory has become corrupted or erased. Restore these files from your most recent backup.

Old `/var/nis` Filenames

In Solaris Release 2.4 and earlier, the /var/nis directory contained two files named hostname.dict and hostname.log. It also contained a subdirectory named /var/nis/hostname. Starting with Solaris Release 2.5, the two files were renamed trans.log and data.dict, and the subdirectory is named /var/nis/data.

Do not rename the /var/nis or /var/nis/data directories or any of the files in these directories that were created by nisinit or any of the other NIS+ setup procedures.

In Solaris Release 2.5, the content of the files were also changed and they are not backward compatible with Solaris Release 2.4 or earlier. Thus, if you rename either the directories or the files to match the Solaris Release 2.4 patterns, the files will not work with either the Solaris Release 2.4 or the Solaris Release 2.5 or later versions of rpc.nisd. Therefore, you should not rename either the directories or the files.

Blanks in Name

Symptoms:

Sometimes an object is there, sometimes it is not. Some NIS+ or UNIX commands report that an NIS+ object does not exist or cannot be found, while other NIS+ or UNIX commands do find that same object.

Diagnoses:

Use nisls to display the object's name. Look carefully at the object's name to see if the name actually begins with a blank space. (If you accidentally enter two spaces after the flag when creating NIS+ objects from the command line with NIS+ commands, some NIS+ commands will interpret the second space as the beginning of the object's name.)

Solution:

If an NIS+ object name begins with a blank space, you must either rename it without the space or remove it and then recreate it from scratch.

Cannot Use Automounter

Symptoms:

You cannot change to a directory on another host.

Possible Cause:

Under NIS+, automounter names must be renamed to meet NIS+ requirements. NIS+ cannot access /etc/auto* tables that contain a period in the name. For example, NIS+ cannot access a file named auto.direct.

Diagnosis:

Use nisls and niscat to determine if the automounter tables are properly constructed.

Solution:

Change the periods to underscores. For example, change auto.direct to auto_direct. (Be sure to change other maps that might reference these.)

Links To or From Table Entries Do Not Work

You cannot use the nisln command (or any other command) to create links between entries in tables. NIS+ commands do not follow links at the entry level.

NIS+ Ownership and Permission Problems

This section describes problems related to user ownership and permissions. Common symptoms include:

Error messages with operative clauses such as:

Unable to stat name
Unable to stat NIS+ directory name
Security exception on LOCAL system
Unable to make request
Insufficient permission to . . .
You name do not have secure RPC credentials

Another Symptom:

User or root unable to perform any namespace task.

No Permission

The most common permission problem is the simplest: you have not been granted permission to perform some task that you try to do. Use niscat -o on the object in question to determine what permissions you have. If you need additional permission, you, the owner of the object, or the system administrator can either change the permission requirements of the object (as described in Chapter 10, Administering NIS+ Access Rights,) or add you to a group that does have the required permissions (as described in Chapter 12, Administering NIS+ Groups).

No Credentials

Without proper credentials for you and your machine, many operations will fail. Use nismatch on your home domain's cred table to make sure you have the right credentials. See "Corrupted Credentials" for more on credentials-related problems.

Server Running at Security Level 0

A server running at security level 0 does not create or maintain credentials for NIS+ principals.

If you try to use passwd on a server that is running at security level 0, you will get the error message: You name do not have secure RPC credentials in NIS+ domain domainname.

Security level 0 is only to be used by administrators for initial namespace setup and testing purposes. Level 0 should not be used in any environment where ordinary users are active.

User Login Same as Machine Name

A user cannot have the same login ID as a machine name. When a machine is given the same name as a user (or vice versa), the first principal can no longer perform operations requiring secure permissions because the second principal's key has overwritten the first principal's key in the cred table. In addition, the second principal now has whatever permissions were granted to the first principal.

For example, suppose a user with the login name of saladin is granted namespace read-only permissions. Then a machine named saladin is added to the domain. The user saladin will no longer be able to perform any namespace operations requiring any sort of permission, and the root user of the machine saladin will only have read-only permission in the namespace.

Symptoms:

The user or machine gets "permission denied" type error messages.
Either the user or root for that machine cannot successfully run keylogin.
Security exception on LOCAL system. UNABLE TO MAKE REQUEST. error message.
If the first principal did not have read access, the second principal might not be able to view otherwise visible objects.

Note -

When running nisclient or nisaddcred, if the message Changing Key is displayed rather than Adding Key, there is a duplicate user or host name already in existence in that domain.

Diagnosis:

Run nismatch to find the host and user in the hosts and passwd tables to see if there are identical host names and user names in the respective tables:

nismatch username passwd.org_dir

Then run nismatch on the domain's cred table to see what type of credentials are provided for the duplicate host or user name. If there are both LOCAL and DES credentials, the cred table entry is for the user; if there is only a DES credential, the entry is for the machine.

Solution:

Change the machine name. (It is better to change the machine name than to change the user name.) Then delete the machine's entry from the cred table and use nisclient to reinitialize the machine as an NIS+ client. (If you wish, you can use nistbladm to create an alias for the machine's old name in the hosts tables.) If necessary, replace the user's credentials in the cred table.

Bad Credentials

See "Corrupted Credentials".

NIS+ Security Problems

This section describes common password, credential, encryption, and other security-related problems.

Security Problem Symptoms

Error messages with operative clauses such as:

Authentication error
Authentication denied
Cannot get public key
Chkey failed
Insufficient permission to
Login incorrect
Keyserv fails to encrypt
No public key
Permission denied
Password [problems]

User or root unable to perform any namespace operations or tasks. (See also "NIS+ Ownership and Permission Problems".)

`Login Incorrect` Message

The most common cause of a "login incorrect" message is the user mistyping the password. Have the user try it again. Make sure the user knows the correct password and understands that passwords are case-sensitive and also that the letter "o" is not interchangeable with the numeral "0," nor is the letter "l" the same as the numeral "1."

Other possible causes of the "login incorrect" message are:

The password has been locked by an administrator. See "Locking a Password" and "Unlocking a Password".
The password has been locked because the user has exceeded an inactivity maximum See "Specifying Maximum Number of Inactive Days".
The password has expired. See "Password Privilege Expiration".

See Chapter 11, Administering Passwords for general information on passwords.

Password Locked, Expired, or Terminated

A common cause of a Permission denied, password expired, type message is that the user's password has passed its age limit or the user's password privileges have expired. See Chapter 11, Administering Passwords for general information on passwords.

See "Setting a Password Age Limit".
See "Password Privilege Expiration".

Stale and Outdated Credential Information

Occasionally, you may find that even though you have created the proper credentials and assigned the proper access rights, some client requests still get denied. This may be due to out-of-date information residing somewhere in the namespace.

Storing and Updating Credential Information

Credential-related information, such as public keys, is stored in many locations throughout the namespace. NIS+ updates this information periodically, depending on the time-to-live values of the objects that store it, but sometimes, between updates, it gets out of sync. As a result, you may find that operations that should work, don't work. Table A-2 lists all the objects, tables, and files that store credential-related information and how to reset it.

Table A-2 Where Credential-Related Information is Stored


Item	Stores	To Reset or Change
cred table	NIS+ principal's secret key and public key. These are the master copies of these keys.	Use `nisaddcred` to create new credentials; it updates existing credentials. An alternative is `chkey`.
Directory object	A copy of the public key of each server that supports it.	Run the `/usr/lib/nis/` `nisupdkeys` command on the directory object.
Keyserver	The secret key of the NIS+ principal that is currently logged in.	Run `keylogin` for a principal user or `keylogin` `-r` for a principal workstation.
NIS+ daemon	Copies of directory objects, which in turn contain copies of their servers' public keys.	Kill the daemon and the cache manager. Then restart both.
Directory cache	A copy of directory objects, which in turn contain copies of their servers' public keys.	Kill the NIS+ cache manager and restart it with the `nis_cachemgr` `-i` command. The `-i` option resets the directory cache from the cold-start file and restarts the cache manager.
Cold-start file	A copy of a directory object, which in turn contains copies of its servers' public keys.	On the root master, kill the NIS+ daemon and restart it. The daemon reloads new information into the existing `NIS_COLD_START` file. For a client, first remove the cold-start and shared directory files from `/var/nis`, and kill the cache manager. Then re-initialize the principal with `nisinit` `-c`. The principal's trusted server reloads new information into the principal's existing cold-start file.
passwd table	A user's password or a workstation's superuser password.	Use the `passwd -r nisplus` command. It changes the password in the NIS+ passwd table and updates it in the cred table.
`passwd` file	A user's password or a workstation's superuser password.	Use the `passwd -r nisplus` command, whether logged in as superuser or as yourself, whichever is appropriate.
`passwd` map (NIS)	A user's password or a workstation's superuser password.	Use `passwd -r nisplus`.

Updating Stale Cached Keys

The most commonly encountered out-of-date information is the existence of stale objects with old versions of a server's public key. You can usually correct this problem by running nisupdkeys on the domain you are trying to access. (See Chapter 7, Administering NIS+ Credentials for information on using the nisupdkeys command.)

Because some keys are stored in files or caches, nisupdkeys cannot always correct the problem. At times you might need to update the keys manually. To do that, you must understand how a server's public key, once created, is propagated through namespace objects. The process usually has five stages of propagation. Each stage is described below.

Stage 1: Server's Public Key Is Generated

An NIS+ server is first an NIS+ client. So, its public key is generated in the same way as any other NIS+ client's public key: with the nisaddcred command. The public key is then stored in the cred table of the server's home domain, not of the domain the server will eventually support.

Stage 2: Public Key Is Propagated to Directory Objects

Once you have set up an NIS+ domain and an NIS+ server, you can associate the server with a domain. This association is performed by the nismkdir command. When the nismkdir command associates the server with the directory, it also copies the server's public key from the cred table to the domain's directory object. For example, assume the server is a client of the doc.com. root domain, and is made the master server of the sales.doc.com. domain.

Figure A-1 Public Key is Propagated to Directory Objects

Its public key is copied from the cred.org_dir.doc.com. domain and placed in the sales.doc.com. directory object. This can be verified with the niscat -o sales.doc.com. command.

Stage 3: Directory Objects Are Propagated Into Client Files

All NIS+ clients are initialized with the nisinit utility or with the nisclient script.

Among other things, nisinit (or nisclient) creates a cold-start file /var/nis/NIS_COLDSTART. The cold-start file is used to initialize the client's directory cache /var/nis/NIS_SHARED_DIRCACHE. The cold-start file contains a copy of the directory object of the client's domain. Since the directory object already contains a copy of the server's public key, the key is now propagated into the cold-start file of the client.

In addition when a client makes a request to a server outside its home domain, a copy of the remote domains directory object is stored in the client's NIS_SHARED_DIRCACHE file. You can examine the contents of the client's cache by using the nisshowcache command, described on page 184.

This is the extent of the propagation until a replica is added to the domain or the server's key changes.

Stage 4: When a Replica is Added to the Domain

When a replica server is added to a domain, the nisping command (described in "The nisping Command") is used to download the NIS+ tables, including the cred table, to the new replica. Therefore, the original server's public key is now also stored in the replica server's cred table.

Stage 5: When the Server's Public Key Is Changed

If you decide to change DES credentials for the server (that is, for the root identity on the server), its public key will change. As a result, the public key stored for that server in the cred table will be different from those stored in:

The cred table of replica servers (for a few minutes only)
The main directory object of the domain supported by the server (until its time-to-live expires)
The NIS_COLDSTART and NIS_SHARED_DIRCACHE files of every client of the domain supported by server (until their time-to-live expires, usually 12 hours)
The NIS_SHARED_DIRCACHE file of clients who have made requests to the domain supported by the server (until their time-to-live expires)

Most of these locations will be updated automatically within a time ranging from a few minutes to 12 hours. To update the server's keys in these locations immediately, use the commands:

Table A-3 Updating a Server's Keys


Location	Command	See
Cred table of replica servers (instead of using `nisping`, you can wait a few minutes until the table is updated automatically)	`nisping`	"The `nisping` Command"
Directory object of domain supported by server	`nisupdkeys`	"The `nisupdkeys` Command"
`NIS_COLDSTART` file of clients	`nisinit` `-c`	"The `nisinit` Command"
`NIS_SHARED_DIRCACHE` file of clients	`nis_cachemgr`	"The `nis_cachemgr` Command"

Note -

You must first kill the existing nis_cachemgr before restarting nis_cachemgr.

Corrupted Credentials

When a principal (user or machine) has a corrupt credential, that principal is unable to perform any namespace operations or tasks, not even nisls. This is because a corrupt credential provides no permissions at all, not even the permissions granted to the nobody class.

Symptoms:

User or root cannot perform any namespace tasks or operations. All namespace operations fail with a "permission denied" type of error message. The user or root cannot even perform a nisls operation.

Possible Cause:

Corrupted keys or a corrupt, out-of-date, or otherwise incorrect /etc/.rootkey file.

Diagnosis:

Use snoop to identify the bad credential.

Or, if the principal is listed, log in as the principal and try to run an NIS+ command on an object for which you are sure that the principal has proper authorization. For example, in most cases an object grants read authorization to the nobody class. Thus, the nisls object command should work for any principal listed in the cred table. If the command fails with a "permission denied" error, then the principal's credential is likely corrupted.

Solution

Ordinary user. Perform a keylogout and then a keylogin for that principal.
Root or superuser. Run keylogout -f followed by keylogin -r.

`Keyserv` Failure

The keyserv is unable to encrypt a session. There are several possible causes for this type of problem:

Possible Causes and Solutions:

The client has not keylogged in. Make sure that the client is keylogged in. To determine if a client is properly keylogged in, have the client run nisdefaults -v (or run it yourself as the client). If (not authenticated) is returned on the Principal Name line, the client is not properly keylogged in.
The client (host) does not have appropriate LOCAL or DES credentials. Run niscat on the client's cred table to verify that the client has appropriate credentials. If necessary, add credentials as explained in "Creating Credential Information for NIS+ Principals".
The keyserv daemon is not running. Use the ps command to see if keyserv is running. If it is not running, restart it and then do a keylogin.
While keyserv is running, other long running processes that make secure RPC or NIS+ calls are not. For example, automountd, rpc.nisd, and sendmail. Verify that these processes are running correctly. If they are not, restart them.

Machine Previously Was an NIS+ Client

If this machine has been initialized before as an NIS+ client of the same domain, try keylogin -r and enter the root login password at the Secure RPC password prompt.

No Entry in the cred Table

To make sure that an NIS+ password for the principal (user or host) exists in the cred table, run the following command in the principal's home domain

nisgrep -A cname=principal cred.org_dir.domainname

If you are running nisgrep from another domain, the domainname must be fully qualified.

Changed Domain Name

Do not change a domain name.

If you change the name of an existing domain you will create authentication problems because the fully qualified original domain name is embedded in objects throughout your network.

If you have already changed a domain name and are experiencing authentication problems, or error messages containing terms like "malformed" or "illegal" in relation to a domain name, change the domain name back to its original name. The recommended procedure for renaming your domains is to create a new domain with the new name, set up your machines as servers and clients of the new domain, make sure they are performing correctly, and then remove the old domain.

When Changing a Machine to a Different Domain

If this machine is an NIS+ client and you are trying to change it to a client of a different domain, remove the /etc/.rootkey file, and then rerun the nisclient script using the network password supplied by your network administrator or taken from the nispopulate script.

NIS+ and Login Passwords in `/etc/passwd` File

Your NIS+ password is stored in the NIS+ passwd table. Your user login password may be stored in NIS+ passwd table or in your /etc/passwd file. (Your user password and NIS+ password can be the same or different.) To change a password in an /etc/passwd file, you must run the passwd command with the nsswitch.conf file set to files or with the -r files flag.

The nsswitch.conf file specifies which password is used for which purpose. If the nsswitch.conf file is directing system queries to the wrong location, you will get password and permission errors.

Secure RPC Password and Login Passwords Are Different

When a principal's login password is different from his or her Secure RPC password, keylogin cannot decrypt it at login time because keylogin defaults to using the principal's login password, and the private key was encrypted using the principal's Secure RPC password.

When this occurs the principal can log in to the system, but for NIS+ purposes is placed in the authorization class of nobody because the keyserver does not have a decrypted private key for that user. Since most NIS+ environments are set up to deny the nobody class create, destroy, and modify rights to most NIS+ objects this results in "permission denied" types errors when the user tries to access NIS+ objects.

Note -

In this context network password is sometimes used as a synonym for Secure RPC password. When prompted for your "network password," enter your Secure RPC password.

To be placed in one of the other authorization classes, a user in this situation must explicitly run the keylogin program and give the principal's Secure RPC password when keylogin prompts for password. (See "Keylogin".)

But an explicit keylogin provides only a temporary solution that is good only for the current login session. The keyserver now has a decrypted private key for the user, but the private key in the user's cred table is still encrypted using the user's Secure RPC password, which is different than the user's login password. The next time the user logs in, the same problem reoccurs. To permanently solve the problem the user needs to change the private key in the cred table to one based on the user's login ID rather than the user's Secure RPC password. To do this, the user need to run the chkey program as described in "Changing Keys for an NIS+ Principal".

Thus, to permanently solve a Secure RPC password different than login password problems, the user (or an administrator acting for the user) must perform the following steps:

Log in using the login password.
Run the keylogin program to temporarily get a decrypted private key stored in the keyserver and thus gain temporary NIS+ access privileges.
Run chkey -pto permanently change the encrypted private key in the cred table to one based on the user's login password.

Preexisting `/etc/.rootkey` File

Symptoms:

Various insufficient permission to and permission denied error messages.

Possible Cause:

An /etc/.rootkey file already existed when you set up or initialized a server or client. This could occur if NIS+ had been previously installed on the machine and the .rootkey file was not erased when NIS+ was removed or the machine returned to using NIS or /etc files.

Diagnosis:

Run ls -l on the /etc directory and nisls -l org_dir and compare the date of the /etc/.rootkey to the date of the cred table. If the /etc/.rootkey date is clearly earlier than that of the cred table, it may be a preexisting file.

Solution:

Run keylogin -r as root on the problem machine and then set up the machine as a client again.

Root Password Change Causes Problem

Symptoms:

You change the root password on a machine, and the change either fails to take effect or you are unable to log in as superuser.

Possible Cause:

Note -

For security reasons, you should not have User ID 0 listed in the passwd table.

You changed the root password, but root's key was not properly updated. Either because you forgot to run chkey -p for root or some problem came up.

Solution

Log in as a user with administration privileges (that is, a user who is a member of a group with administration privileges) and use passwd to restore the old password. Make sure that old password works. Now use passwd to change root's password to the new one, and then run chkey -p.

Caution -

Once your NIS+ namespace is set up and running, you can change the root password on the root master machine. But do not change the root master keys, as these are embedded in all directory objects on all clients, replicas, and servers of subdomains. To avoid changing the root master keys, always use the -p option when running chkey as root.

NIS+ Performance and System Hang Problems

This section describes common slow performance and system hang problems.

Performance Problem Symptoms

Error messages with operative clauses such as:

Busy try again later
Not responding

Other common symptoms:

You issue a command and nothing seems to happen for far too long.
Your system, or shell, no longer responds to keyboard or mouse commands.
NIS+ operations seem to run slower than they should or slower than they did before.

Checkpointing

Someone has issued an nisping or nisping -C command. Or the rpc.nisd daemon is performing a checkpoint operation.

Caution -

Do not reboot! Do not issue any more nisping commands.

When performing a nisping or checkpoint, the server will be sluggish or may not immediately respond to other commands. Depending on the size of your namespace, these commands may take a noticeable amount of time to complete. Delays caused by checkpoint or ping commands are multiplied if you, or someone else, enter several such commands at one time. Do not reboot. This kind of problem will solve itself. Just wait until the server finishes performing the nisping or checkpoint command.

During a full master-replica resync, the involved replica server will be taken out of service until the resync is complete. Do not reboot--just wait.

Variable `NIS_PATH`

Make sure that your NIS_PATH variable is set to something clean and simple. For example, the default: org_dir.$:$. A complex NIS_PATH, particularly one that itself contains a variable, will slow your system and may cause some operations to fail. (See "NIS_PATH Environment Variable" for more information.)

Do not use nistbladm to set nondefault table paths. Nondefault table paths will slow performance.

Table Paths

Do not use table paths because they will slow performance.

Too Many Replicas

Too many replicas for a domain degrade system performance during replication. There should be no more than 10 replicas in a given domain or subdomain. If you have more than five replicas in a domain, try removing some of them to see if that improves performance.

Recursive Groups

A recursive group is a group that contains the name of some other group. While including other groups in a group reduces your work as system administrator, doing so slows down the system. You should not use recursive groups.

Large NIS+ Database Logs at Start-up

When rpc.nisd starts up it goes through each log. If the logs are long, this process could take a long time. If your logs are long, you may want to checkpoint them using nisping -C before starting rpc.nisd.

The Master `rpc.nisd` Daemon Died

Symptoms:

If you used the -M option to specify that your request be sent to the master server, and the rpc.nisd daemon has died on that machine, you will get a "server not responding" type error message and no updates will be permitted. (If you did not use the -M option, your request will be automatically routed to a functioning replica server.)

Possible Cause:

Using uppercase letters in the name of a home directory or host can sometimes cause rpc.nisd to die.

Diagnosis:

First make sure that the server itself is up and running. If it is, run ps -ef | grep rpc.nisd to see if the daemon is still running.

Solution:

If the daemon has died, restart it. If rpc.nisd frequently dies, contact your service provider.

No `nis_cachemgr`

Symptoms:

It takes too long for a machine to locate namespace objects in other domains.

Possible Cause:

You do not have nis_cachemgr running.

Diagnosis:

Run ps -ef | grep nis_cachemgr to see if it is still running.

Solution

Start nis_cachemgr on that machine.

Server Very Slow at Start-up After NIS+ Installation

Symptoms:

A server performs slowly and sluggishly after using the NIS+ scripts to install NIS+ on it.

Possible Cause:

You forgot to run nisping -C -a after running the nispopulate script.

Solution:

Run nisping -C -a to checkpoint the system as soon as you are able to do so.

`niscat` Returns: `Server busy. Try Again`

Symptoms:

You run niscat and get an error message indicating that the server is busy.

Possible Cause:

The server is busy with a heavy load, such as when doing a resync.
The server is out of swap space.

Diagnosis:

Run swap -s to check your server's swap space.

Solution:

You must have adequate swap and disk space to run NIS+. If necessary, increase your space.

NIS+ Queries Hang After Changing Host Name

Symptoms:

Setting the host name for an NIS+ server to be fully qualified is not recommended. If you do so, and NIS+ queries then just hang with no error messages, check the following possibilities:

Possible Cause:

Fully qualified host names must meet the following criteria:

The domain part of the host name must be the same as the name returned by the domainname command.
After the setting the host name to be fully qualified, you must also update all the necessary /etc and /etc/inet files with the new host name information.
The host name must end in a period.

Solution:

Kill the NIS+ processes that are hanging and then kill rpc.nisd on that host or server. Rename the host to match the two requirements listed above. Then reinitialize the server with nisinit. (If queries still hang after you are sure that the host is correctly named, check other problem possibilities in this section.)

NIS+ System Resource Problems

This section describes problems having to do with lack of system resources such as memory, disk space, and so forth.

Resource Problem Symptoms

Error messages with operative clauses such as:

No memory
Out of disk space
"Cannot [do something] with log" type messages
Unable to fork

Insufficient Memory

Lack of sufficient memory or swap space on the system you are working with will cause a wide variety of NIS+ problems and error messages. As a short-term, temporary solution, try to free additional memory by killing unneeded windows and processes. If necessary, exit your windowing system and work from the terminal command line. If you still get messages indicating inadequate memory, you will have to install additional swap space or memory, or switch to a different system that has enough swap space or memory.

Under some circumstances, applications and processes may develop memory leaks and grow too large. you can check the current size of an application or process by running:

ps -el

The sz (size) column shows the current memory size of each process. If necessary, compare the sizes with comparable processes and applications on a machine that is not having memory problems to see if any have grown too large.

Insufficient Disk Space

Lack of disk space will cause a variety of error messages. A common cause of insufficient disk space is failure to regularly remove tmp files and truncate log files. log and tmp files grow steadily larger unless truncated. The speed at which these files grow varies from system to system and with the system state. log files on a system that is working inefficiently or having namespace problems will grow very fast.

Note -

If you are doing a lot of troubleshooting, check your log and tmp files frequently. Truncate log files and remove tmp files before lack of disk space creates additional problems. Also check the root directory and home directories for core files and delete them.

The way to truncate log files is to regularly checkpoint your system (Keep in mind that a checkpoint process may take some time and will slow down your system while it is being performed, checkpointing also requires enough disk space to create a complete copy of the files before they are truncated.)

To checkpoint a system, run nisping -C.

Insufficient Processes

On a heavily loaded machine it is possible that you could reach the maximum number of simultaneous processes that the machine is configured to handle. This causes messages with clauses like "unable to fork". The recommended method of handling this problem is to kill any unnecessary processes. If the problem persists, you can reconfigure the machine to handle more processes as described in your system administration documentation.

NIS+ User Problems

This section describes NIS+ problems that a typical user might encounter.

User Problem Symptoms

User cannot log in.
User cannot rlogin to other domain

User Cannot Log In

There are many possible reasons for a user being unable to log in:

User forgot password. To set up a new password for a user who has forgotten the previous one, run passwd for that user on another machine (naturally, you have to be the NIS+ administrator to do this).
Mistyping password. Make sure the user knows the correct password and understands that passwords are case-sensitive and that the letter "o" is not interchangeable with the numeral "0," nor is the letter "l" the same as the numeral "1."
"Login incorrect" type message. For causes other than simply mistyping the password, see "Login Incorrect Message".
The user's password privileges have expired (see "Password Privilege Expiration").
An inactivity maximum has been set for this user, and the user has passed it (see "Specifying Maximum Number of Inactive Days").
The user's nsswitch.conf file is incorrect. The passwd entry in that file must be one of the following five permitted configurations:
- passwd: files
- passwd: files nis
- passwd: files nisplus
- passwd: compat
- passwd: compat passwd_compat: nisplus
Any other configuration will prevent a user from logging in.

(See "nsswitch.conf File Requirements" for further details.)

User Cannot Log In Using New Password

Symptoms:

Users who recently changed their password are unable to log in at all, or are able to log in on some machines but not on others.

Possible Causes:

It may take some time for the new password to propagate through the network. Have users try to log in with the old password.
The password was changed on a machine that was not running NIS+ (see "User Cannot Log In Using New Password").

User Cannot Remote Log In to Remote Domain

Symptoms:

User tries to rlogin to a machine in some other domain and is refused with a "Permission denied" type error message.

Possible Cause:

To rlogin to a machine in another domain, a user must have LOCAL credentials in that domain.

Diagnosis:

Run nismatch username.domainname. cred.org_dir in the other domain to see if the user has a LOCAL credential in that domain.

Solution:

Go to the remote domain and use nisaddcred to create a LOCAL credential for the user in the that domain.

User Cannot Change Password

The most common cause of a user being unable to change passwords is that the user is mistyping (or has forgotten) the old password.

Other possible causes:

The password Min value has been set to be greater than the password Max value. See "Setting Minimum Password Life".
The password is locked or expired. See "Login Incorrect Message" and "Password Locked, Expired, or Terminated".

Other NIS+ Problems

This section describes problems that do not fit any of the previous categories.

How to Tell if NIS+ Is Running

You may need to know whether a given host is running NIS+. A script may also need to determine whether NIS+ is running.

You can assume that NIS+ is running if:

nis_cachemgr is running.
The host has a /var/nis/NIS_COLD_START file.
nisls succeeds.

Replica Update Failure

Symptoms:

Error messages indicating that the update was not successfully complete. (Note that the message: replica_update: number updates number errors indicates a successful update.)

Possible Causes:

Any of the following error messages indicate that the server was busy and that the update should be rescheduled:

Master server busy, full dump rescheduled
replica_update error result was Master server busy full dump rescheduled, full dump rescheduled
replica_update: master server busy, rescheduling the resync
replica_update: master server busy, will try later
replica_update: nis dump result Master server busy, full dump rescheduled
nis_dump_svc: one replica is already resyncing

(These messages are generated by, or in conjunction with, the NIS+ error code constant: NIS_DUMPLATER one replica is already resyncing.)

These messages indicate that there was some other problem:

replica_update: error result was ...
replica_update: nis dump result nis_perror error string
rootreplica_update: update failednis dump result nis_perror string-variable: could not fetch object from master

(If rpc.nisd is being run with the -C (open diagnostic channel) option, additional information may be entered in either the master server or replica server's system log.

These messages indicate possible problems such as:

The server is out of child processes that can be allocated.
A read-only child process was requested to dump.
Another replica is currently resynching.

Diagnosis:

Check both the replica and server's system log for additional information. How much, if any, additional information is recorded in the system logs depends on your system's error reporting level, and whether or not you are running rpc.nisd with the -C option (diagnostics).

Solution:

In most cases, these messages indicate minor software problems which the system is capable of correcting. If the message was the result of a command, simply wait for a while and then try the command again. If these messages appear often, you can change the threshold level in your /etc/syslog.conf file. See the syslog.conf man page for details.

NIS Problems and Solutions

This section explains how to resolve problems encountered on networks running NIS. It covers problems seen on an NIS client and those seen on an NIS server.

Before trying to debug an NIS server or client, review Chapter 18, Network Information Service (NIS), which explains the NIS environment. Then look for the subheading in this section that best describes your problem.

Symptoms:

Common symptoms of NIS binding problems include:

Messages saying that ypbind can't find or communicate with a server.
Messages saying server not responding.
Messages saying NIS is unavailable
Commands on a client limp along in background mode or function much slower than normal.
Commands on a client hang. Sometimes commands hang even though the system as a whole seems fine and you can run new commands.
Commands on a client crash with obscure messages, or no message at all.

NIS Problems Affecting One Client

If only one or two clients are experiencing symptoms that indicate NIS binding difficulty, the problems probably are on those clients. If many NIS clients are failing to bind properly, the problem probably exists on one or more of the NIS servers (see "NIS Problems Affecting Many Clients").

`ypbind` Not Running on Client

One client has problems, but other clients on the same subnet are operating normally. On the problem client, run ls -l on a directory, such as /usr, that contains files owned by many users, including some not in the client /etc/passwd file. If the resulting display lists file owners who are not in the local /etc/passwd as numbers, rather than names, this indicates that NIS service is not working on the client.

These symptoms usually mean that the client ypbind process is not running. Run ps -e and check for ypbind. If you do not find it, log in as superuser and start ypbind by typing:

client# /usr/lib/netsvc/yp/ypstart

Missing or Incorrect Domain Name

One client has problems, the other clients are operating normally, but ypbind is running on the problem client. The client may have an incorrectly set domain.

On the client, run the domainname command to see which domain name is set.

Client#7 domainname neverland.com

Compare the output with the actual domain name in /var/yp on the NIS master server. The actual NIS domain is shown as a subdirectory in the /var/yp directory.

Client#7 ls /var/yp...
-rwxr-xr-x 1 root Makefile
drwxr-xr-x 2 root binding
drwx------ 2 root doc.com
...

If the domain name returned by running domainname on a machine is not the same as the server domain name listed as a directory in /var/yp, the domain name specified in the machine's /etc/defaultdomain file is incorrect. Log in as superuser and correct the client's domain name in the machine's /etc/defaultdomain file. This assures that the domain name is correct every time the machine boots. Now reboot the machine.

Note -

The domain name is case-sensitive.

Client Not Bound to Server

If your domain name is set correctly, ypbind is running, and commands still hang, then make sure that the client is bound to a server by running the ypwhich command. If you have just started ypbind, then run ypwhich several times (typically, the first one reports that the domain is not bound and the second succeeds normally).

No Server Available

If your domain name is set correctly, ypbind is running, and you get messages indicating that the client cannot communicate with a server, this may indicate a number of different problems:

Does the client have a /var/yp/binding/domainname/ypservers file containing a list of servers to bind to? If not, run ypinit -c and specify in order of preference the servers that this client should bind to.
If the client does have a /var/yp/binding/domainname/ypservers file, are there enough servers listed in it if one or two become unavailable? If not, add additional servers to the list by running yppinit -c.
If none of the servers listed in the client's ypservers file are available, the client searches for an operating server using broadcast mode. If there is a functioning server on the client's subnet, the client will find it (though performance may be slowed during the search). If there are no functioning servers on the client's subnet can solve the problem in several ways:
- If the client has no server on the subnet and no route to one, you can install a new slave server on that subnet.
- You can make sure your routers are configured to pass broadcast packets so that the client can use broadcast to find a server on another subnet. You can use the netstat -r command to verify the route.
- If there should be a route, but it is not working, make sure that the route daemon in.routed/in.rdisc is running. If it is not running, start it.

Note -

For reasons of security and administrative control it is preferable to specify the servers a client is to bind to in the client's ypservers file rather than have the client search for servers through broadcasting. Broadcasting ties up the network, slows the client, and prevents you from balancing server load by listing different servers for different clients.

Do the servers listed in a clients ypservers file have entries in the /etc/hosts file? If not, add the servers to the NIS maps hosts input file and rebuild your maps by running yppinit -c or ypinit -s as described "Working With NIS Maps".
Is the /etc/nsswitch.conf file set up to consult the machine's local hosts file in addition to NIS? See Chapter 2, The Name Service Switch for more information on the switch.
Is the /etc/nsswitch.conf file set up to consult files first for services and rpc?

`ypwhich` Displays Are Inconsistent

When you use ypwhich several times on the same client, the resulting display varies because the NIS server changes. This is normal. The binding of the NIS client to the NIS server changes over time when the network or the NIS servers are busy. Whenever possible, the network stabilizes at a point where all clients get acceptable response time from the NIS servers. As long as your client machine gets NIS service, it does not matter where the service comes from. For example, an NIS server machine may get its own NIS services from another NIS server on the network.

When Server Binding is Not Possible

In extreme cases where local server binding is not possible, use of the ypset command may temporarily allow binding to another server, if available, on another network or subnet. However, in order to use the -ypset option, ypbind must be started with either the -ypset or -ypsetme options.

Note -

For security reasons, the use of the -ypset and -ypsetme options should be limited to debugging purposes under controlled circumstances. Use of the -ypset and -ypsetme options can result in serious security breaches because while they are operative anyone can then alter server bindings causing trouble for others and permitting unauthorized access to sensitive data. If you must start ypbind with these options, once you have fixed the problem you should kill ypbind and restart it again without those options.

`ypbind` Crashes

If ypbind crashes almost immediately each time it is started, look for a problem in some other part of the system. Check for the presence of the rpcbind daemon by typing:

% ps -ef | grep rpcbind

If rpcbind is not present or does not stay up or behaves strangely, consult your RPC documentation.

You may be able to communicate with rpcbind on the problematic client from a machine operating normally. From the functioning machine, type:

% rpcinfo client

If rpcbind on the problematic machine is fine, rpcinfo produces the following output:

program	version	netid	address	service	owner
...
100007	2	udp	0.0.0.0.2.219	ypbind	superuser
100007	1	udp	0.0.0.0.2.219	ypbind	superuser
100007	1	tcp	0.0.0.0.2.220	ypbind	superuser
100007	2	tcp	0.0.0.0.128.4	ypbind	superuser
100007	2	ticotsord	\000\000\020H	ypbind	superuser
100007	2	ticots	\000\000\020K	ypbind	superuser
...

Your machine will have different addresses. If they are not displayed, ypbind has been unable to register its services. Reboot the machine and run rpcinfo again. If the ypbind processes are there and they change each time you try to restart /usr/lib/netsvc/yp/ypbind, reboot the system, even if the rpcbind daemon is running.

NIS Problems Affecting Many Clients

If only one or two clients are experiencing symptoms that indicate NIS binding difficulty, the problems probably are on those clients (see "NIS Problems Affecting One Client"). If many NIS clients are failing to bind properly, the problem probably exists on one or more of the NIS servers.

Network or Servers are Overloaded

NIS can hang if the network or NIS servers are so overloaded that ypserv cannot get a response back to the client ypbind process within the time-out period.

Under these circumstances, every client on the network experiences the same or similar problems. In most cases, the condition is temporary. The messages usually go away when the NIS server reboots and restarts ypserv, or when the load on the NIS servers or network itself decreases.

Server Malfunction

Make sure the servers are up and running. If you are not physically near the servers, use the ping command.

NIS Daemons Not Running

If the servers are up and running, try to find a client machine behaving normally, and run the ypwhich command. If ypwhich does not respond, kill it. Then log in as root on the NIS server and check if the NIS ypbind process is running by entering:

# ps -e | grep yp

Note -

Do not use the -f option with ps because this option attempts to translate user IDs to names which causes more name service lookups that may not succeed.

If either the ypbind or ypserv daemons are not running, kill them and then restart them by entering:

# /usr/lib/netsvc/yp/ypstop
# /usr/lib/netsvc/yp/ypstart

If both the ypserv and ypbind processes are running on the NIS server, type:

# ypwhich

If ypwhich does not respond, ypserv has probably hung and should be restarted. While logged in as root on the server, kill ypserv and restart it by typing:

# /usr/lib/netsvc/yp/ypstop
# /usr/lib/netsvc/yp/ypstart

Servers Have Different Versions of an NIS Map

Because NIS propagates maps among servers, occasionally you may find different versions of the same map on various NIS servers on the network. This version discrepancy is normal add acceptable if the differences do not last for more than a short time.

The most common cause of map discrepancy is that something is preventing normal map propagation. For example, an NIS server or router between NIS servers is down. When all NIS servers and the routers between them are running, ypxfr should succeed.

If the servers and routers are functioning properly, check the following:

Log ypxfr output (see "Logging ypxfr Output").
Check the control files (see "Check the crontab File and ypxfr Shell Script").
Check the ypservers map on the master (see "Check the ypservers Map").

Logging `ypxfr` Output

If a particular slave server has problems updating maps, log in to that server and run ypxfr interactively. If ypxfr fails, it tells you why it failed, and you can fix the problem. If ypxfr succeeds, but you suspect it has occasionally failed, create a log file to enable logging of messages. To create a log file, enter:

ypslave# cd /var/yp
ypslave# touch ypxfr.log

This creates a ypxfr.log file that saves all output from ypxfr.

The output resembles the output ypxfr displays when run interactively, but each line in the log file is time stamped. (You may see unusual ordering in the time-stamps. That is okay--the time-stamp tells you when ypxfr started to run. If copies of ypxfr ran simultaneously but their work took differing amounts of time, they may actually write their summary status line to the log files in an order different from that which they were invoked.) Any pattern of intermittent failure shows up in the log.

Note -

When you have fixed the problem, turn off logging by removing the log file. If you forget to remove it, it continues to grow without limit.

Check the `crontab` File and `ypxfr` Shell Script

Inspect the root crontab file, and check the ypxfr shell script it invokes. Typographical errors in these files can cause propagation problems. Failures to refer to a shell script within the /var/spool/cron/crontabs/root file, or failures to refer to a map within any shell script can also cause errors.

Check the `ypservers` Map

Also, make sure that the NIS slave server is listed in the ypservers map on the master server for the domain. If it is not, the slave server still operates perfectly as a server, but yppush does not propagate map changes to the slave server.

Work Around

If the NIS slave server problem is not obvious, you can work around it while you debug using rcp or ftp to copy a recent version of the inconsistent map from any healthy NIS server. For instance, here is how you might transfer the problem map:

 ypslave# rcp ypmaster:/var/yp/mydomain/map.\* /var/yp/mydomain

Here the * character has been escaped in the command line, so that it will be expanded on ypmaster, instead of locally on ypslave.

`ypserv` Crashes

When the ypserv process crashes almost immediately, and does not stay up even with repeated activations, the debug process is virtually identical to that described in "ypbind Crashes". Check for the existence of the rpcbind daemon as follows:

ypserver% ps -e | grep rpcbind

Reboot the server if you do not find the daemon. Otherwise, if the daemon is running, type the following and look for similar output:

% rpcinfo -p ypserver
program 	vers 	proto 	port 	service
100000	4	tcp	111	portmapper
100000	3	tcp	111	portmapper
100068	2	udp	32813	cmsd
...
100007	1	tcp	34900	ypbind
100004	2	udp	731	ypserv
100004	1	udp	731	ypserv
100004	1	tcp	732	ypserv
100004	2	tcp	32772	ypserv

Your machine may have different port numbers. The four entries representing the ypserv process are:

100004 	2 	udp 	731 	ypserv
100004 	1 	udp 	731 	ypserv
100004 	1 	tcp 	732 	ypserv
100004 	2 	tcp 	32772 	ypserv

If they are not present, and ypserv is unable to register its services with rpcbind, reboot the machine. If they are present, deregister the service from rpcbind before restarting ypserv. To deregister the service from rpcbind, on the server type:

# rpcinfo -d number 1
# rpcinfo -d number 2

Where number is the ID number reported by rpcinfo (100004, in the example above).

DNS Problems and Solutions

This section describes some common DNS problems and how to solve them.

Clients Can Find Machine by Name but Server Cannot

Symptoms:

DNS clients can find machines by either IP address or by host name, but the server can only find machines by their IP addresses.

Probable cause and solution:

This is most likely caused by omitting DNS from the hosts line of the server's nsswitch.conf file. For example, a bad hosts line might look like this: hosts: files

When using DNS you must include dns in the hosts record of every machine's nsswitch.conf file. For example:

hosts: dns nisplus [NOTFOUND=return] files

hosts: nisplus dns [NOTFOUND=return] files

Changes Do Not Take Effect or Are Erratic

Symptom:

You add or delete machines or servers but your changes are not recognized or do not take effect. Or in some instances the changes are recognized and at other times they are not in effect.

Probable cause:

The most likely cause is that you forgot to increment the SOA serial number on the primary master server after you made your change. Since there is no new SOA number, your secondary servers do not update their data to match that of the primary so they are working with the old, unchanged data files.

Another possible cause is that the SOA serial number in one or more of the primary data files was set to a value lower than the corresponding serial number on your secondary servers. This could happen, for example, if you deleted a file on the primary and then recreated it from scratch using an input file of some sort.

A third possible cause is that you forgot to send a HUP signal to the primary server after making changes to the primary's data files.

Diagnosis and solution:

First, check the SOA serial numbers in the data file that you changed and the corresponding file on the secondary server.

If the SOA serial number in the primary file is equal to, or less than, the serial number in the secondary file, increase the serial number on the primary's file so that it is greater than the number in the secondary file. For example, if the SOA number in both files is 37, change the number in the primary's file to 38. The next time the secondary checks with the primary, it will load the new data. (There are utilities that can force a primary to immediately transfer data to the secondaries, if you have one of these utilities you can update the secondary without waiting for it to check the primary.)
Review the syslog output for the most recent named nnnn restarted or named nnn reloading nameserver entry. If the timestamp for that entry is before the time you finished making changes to the file, either reboot the server or force it to read the new data as explained in "Forcing in.named to Reload DNS Data".

DNS Client Cannot Lookup "Short" Names

Symptoms:

Client can lookup fully qualified names but not short names.

Possible cause and solution:

Check the client's /etc/resolv.conf file for spaces at the end of the domain name. No spaces or tabs are allowed at the end of the domain name.

Reverse Domain Data Not Correctly Transferred to Secondary

While zone domain-named data is properly transferred from the zone primary master server to a zone secondary server, the reverse domain data is not being transferred. In other words, the host.rev file on the secondary is not being properly updated from the primary.

Possible causes:

Syntax error in the secondary server's boot file.

Diagnosis and Solution:

Check the secondary server's boot file. Make sure that the primary server's IP address is listed for the reverse zone entries just as it is for the hosts data.

For example, the following boot file is incorrect because the primary server's IP address (129.146.168.119) is missing from the secondary in-addr.arpa record:

;
; /etc/named.boot file for dnssecondary
directory /var/named
secondary   doc.com   129.146.168.119        dnshosts.bakup
secondary   168.146.129.in-addr.arpa  doc.rev.bakup

This is what the correct file should look like:

;
; /etc/named.boot file for dnssecondary
directory /var/named
secondary   doc.com   129.146.168.119        dnshosts.bakup
secondary   168.146.129.in-addr.arpa   129.146.168.119  doc.rev.bakup

Server Failed and Zone Expired Problems

When a secondary server cannot obtain updates from its master, it logs a master unreachable message. If the problem is not corrected, the secondary expires the zone and stops answering requests from clients. When that happens, users start seeing server failed messages.

Symptoms:

Masters for secondary zone domain unreachable messages in syslog.
Secondary zone domain expired messages in syslog.
*** domain Can't find name: server failed messages to users.

Note that if the problem lies with a secondary server, some users could still be successfully obtaining DNS information from the master and thus operating without experiencing any difficulty.

Possible causes:

The two most likely causes for these problems are network failure and a wrong IP address for the master in the secondary's boot file.

Diagnosis and solution:

Check that the secondary's boot file contains the correct IP address for the master. Check the line:
secondary domain IPaddress hostsfile

Make sure that the IP address of the master matches the master's actual IP address and the address for the master specified in the hosts file. If the IP address is wrong, correct it, and then reboot the secondary.

If the master's IP address is correct, make sure the master is up and running correctly by pinging the master's IP address: For example, to ping the master at IP address 129.146.168.119, you would enter:

% ping 129.146.168.119 -n 10

If the master does not respond to the ping, make sure it is up and running properly.
If the master is running okay, use ps to make sure it is running named. If it is not running named, reboot it.
If the master is correctly running named, you most likely have a network problem.

`rlogin`, `rsh`, and `ftp` Problems

Symptoms:

Users are asked for password when they try to rlogin to a machine in another domain over the Internet.
Users are denied access when they try to ftp to a machine in another domain over the Internet.
Users are denied access when they try to use rlogin or rsh to a machine on their own network.

Possible causes:

The user is working at a machine that does not have a PTR record in the primary master server's hosts.rev file.
A missing or incorrect delegation of a sub-domain in the hosts.rev file.

Diagnosis and solution:

Check the appropriate hosts.rev file and make sure there is a PTR record for the user's machine. For example, if the user is working at the machine altair.doc.com with an IP address of 129.146.168.46, the doc.com primary master server's doc.rev file should have an entry like:

46 	IN	 PTR 	altair.doc.com.

If the record is missing, add it to the hosts.rev file and then reboot the server or reload its data as explained in "Forcing in.named to Reload DNS Data".

Check and correct the NS entries in the hosts.rev files and then reboot the server or reload its data as explained in "Forcing in.named to Reload DNS Data".

Other DNS Syntax Errors

Symptoms:

Error messages in console or syslog with operative phrases like the following are most often caused by syntax errors in DNS data and boot files:

No such...
Unknown field...
Non-authoritative answer:
Database format error...
illegal or (illegal)
error receiving zone transfer

Check the relevant files for spelling and syntax errors.

A common syntax error is misuse of the trailing dot in domain names (either using the dot when you should not, or not using it when you should). See "Trailing Dots in Domain Names".

FNS Problems and Solutions

This section presents problem scenarios with a description of probable causes, diagnoses, and solutions.

See "FNS Error Messages" for general information about FNS error messages, and Appendix B, Error Messages.

Cannot Obtain Initial Context

Symptom:

You get the message Cannot obtain initial context.

Possible Cause:

This is caused by an installation problem.

Diagnosis:

Check that FNS has been installed properly by looking for the file, /usr/lib/fn/fn_ctx_initial.so.

Solution:

Install the fn_ctx_initial.so library.

Nothing in Initial Context

Symptom:

When you run fnlist to see what is in the initial context, you see nothing.

Possible Cause:

This is caused by an NIS+ configuration problem. The organization associated with the user and machine running the fn* commands do not have an associated ctx_dir directory.

Diagnosis:

Use the nisls command to see whether there is a ctx_dir directory.

Solution:

If there is no ctx_dir directory, run fncreate -t org/nis+_domain_name/ to create the ctx_dir directory.

"No Permission" Messages (FNS)

Symptom:

You get no permission messages.

Possible Cause:

"No permission" messages mean that you do not have access to perform the command.

Diagnosis:

Check permission using the appropriate NIS+ commands, described in "Advanced FNS and NIS+ Issues". Use the nisdefaults command to determine your NIS+ principal name.

Another area to check is whether you are using the right name. For example, org// names the context of the root organization. Make sure you have permission to manipulate the root organization. Or maybe you meant to specify myorgunit/, instead.

Solution:

If you do have permission, then the appropriate credentials probably have not been acquired.

This could be caused by the following:

A keylogin has not been performed (defaults to NIS+ principal "nobody")
A keylogin was made to a source other than NIS+

Check that the /etc/nsswitch.conf file has a publickey: nisplus entry. This might manifest itself as an authentication error.

`fnlist` Does not List Suborganizations

Symptom:

You run fnlist with an organization name, expecting to see suborganizations, but instead see nothing.

Possible Cause:

This is caused by an NIS+ configuration problem. Suborganizations must be NIS+ domains. By definition, an NIS+ domain must have a subdirectory named org_dir.

Diagnosis:

Use the nisls command to see what subdirectories exist. Run nisls on each subdirectory to verify which subdirectories have an org_dir. The subdirectories with an org_dir are suborganizations.

Solution:

Not applicable.

Cannot Create Host- or User-related Contexts

Symptom:

When you run fncreate -t for the user, username, host, or hostname contexts, nothing happens.

Possible Cause:

You have not set the NIS_GROUP environment variable. When you create a user or host context it is owned by the host or user, and not by the administrator who set up the namespace. Therefore, fncreate requires that the NIS_GROUP variable be set to enable the administrators who are part of that group to subsequently manipulate the contexts.

Diagnosis:

Check the NIS_GROUP environment variable.

Solution:

The NIS_GROUP environment variable should be set to the group name of the administrators who will administer the contexts.

Cannot Remove a Context You Created

Symptom:

When you run fndestroy on the host or user context the context is not removed.

Possible Cause:

You do not own the host or user context. When you create a user or host context it is owned by the host or user, and not by the administrator who set up the namespace.

Diagnosis:

Check the NIS_GROUP environment variable.

Solution:

The NIS_GROUP environment variable needs to be set to the group name of the administrator who will administer the contexts.

`Name in Use` with `fnunbind`

Symptom:

You get "name in use" when trying to remove bindings. It works for certain names but not for others.

Possible Cause:

You cannot unbind the name of a context. This restriction is in place to avoid leaving behind contexts that have no name ("orphaned contexts").

Diagnosis:

Run the fnlist command on the name to verify that it is a context.

Solution:

If the name is a context, use the fndestroy command to destroy the context.

`Name in Use` with `fnbind`/`fncreate -s`

Symptom:

You use the -s option with fnbind and fncreate, but for certain names you get "name in use."

Possible Cause:

fnbind -s and fncreate -soverwrite the existing binding if it already exists; but if the old binding is one that must be kept to avoid orphaned contexts, the operation fails with a "name in use" error because the binding could not be removed. This is done to avoid orphaned contexts.

Diagnosis:

Run the fnlist command on the name to verify that it is a context.

Solution:

Run the fndestroy command to remove the context before running fnbind or fncreate on the same name.

`fndestroy`/`fnunbind` Does Not Return `Operation Failed`

Symptom:

When you do an fndestroy or fnunbind on certain names that you know do not exist, you receive no indication that the operation failed.

Possible Cause:

The operation did not fail. The semantics of fndestroy and fnunbind are that if the terminal name is not bound, the operation returns success.

Diagnosis:

Run the fnlookup command on the name. You should receive the message, "name not found."

Solution:

Not applicable.

Appendix A Problems and Solutions

Troubleshooting NIS+

NIS+ Debugging Options

NIS+ Administration Problems

Illegal Object Problems

nisinit Fails

Checkpoint Keeps Failing

Cannot Add User to a Group

Logs Grow too Large

Lack of Disk Space

Cannot Truncate Transaction Log File

Domain Name Confusion

Cannot Delete org_dir or groups_dir

Removal or Disassociation of NIS+ Directory from Replica Fails

NIS+ Database Problems

Multiple rpc.nisd Parent Processes

rpc.nisd Fails

NIS+ and NIS Compatibility Problems

User Cannot Log In After Password Change

nsswitch.conf File Fails to Perform Correctly

NIS+ Object Not Found Problems

Syntax or Spelling Error

Incorrect Path

Domain Levels Not Correctly Specified

Object Does Not Exist

Lagging or Out-of-Sync Replica

Files Missing or Corrupt

Old /var/nis Filenames

Blanks in Name

Cannot Use Automounter

Links To or From Table Entries Do Not Work

NIS+ Ownership and Permission Problems

No Permission

No Credentials

Server Running at Security Level 0

User Login Same as Machine Name

Bad Credentials

NIS+ Security Problems

Security Problem Symptoms

Login Incorrect Message

Password Locked, Expired, or Terminated

Stale and Outdated Credential Information

Storing and Updating Credential Information

Updating Stale Cached Keys

Figure A-1 Public Key is Propagated to Directory Objects

Corrupted Credentials

Keyserv Failure

Machine Previously Was an NIS+ Client

No Entry in the cred Table

Changed Domain Name

When Changing a Machine to a Different Domain

NIS+ and Login Passwords in /etc/passwd File

Secure RPC Password and Login Passwords Are Different

Preexisting /etc/.rootkey File

Root Password Change Causes Problem

NIS+ Performance and System Hang Problems

Performance Problem Symptoms

Checkpointing

Variable NIS_PATH

Table Paths

Too Many Replicas

Recursive Groups

Large NIS+ Database Logs at Start-up

The Master rpc.nisd Daemon Died

No nis_cachemgr

Server Very Slow at Start-up After NIS+ Installation

niscat Returns: Server busy. Try Again

NIS+ Queries Hang After Changing Host Name

NIS+ System Resource Problems

Resource Problem Symptoms

Insufficient Memory

Insufficient Disk Space

Insufficient Processes

NIS+ User Problems

User Problem Symptoms

User Cannot Log In

User Cannot Log In Using New Password

User Cannot Remote Log In to Remote Domain

User Cannot Change Password

Other NIS+ Problems

`nisinit` Fails

Cannot Delete `org_dir` or `groups_dir`

Multiple `rpc.nisd` Parent Processes

`rpc.nisd` Fails

`nsswitch.conf` File Fails to Perform Correctly

Old `/var/nis` Filenames

`Login Incorrect` Message

`Keyserv` Failure

NIS+ and Login Passwords in `/etc/passwd` File

Preexisting `/etc/.rootkey` File

Variable `NIS_PATH`

The Master `rpc.nisd` Daemon Died

No `nis_cachemgr`

`niscat` Returns: `Server busy. Try Again`

`ypbind` Not Running on Client

`ypwhich` Displays Are Inconsistent

`ypbind` Crashes

Logging `ypxfr` Output

Check the `crontab` File and `ypxfr` Shell Script

Check the `ypservers` Map

`ypserv` Crashes

`rlogin`, `rsh`, and `ftp` Problems

`fnlist` Does not List Suborganizations

`Name in Use` with `fnunbind`

`Name in Use` with `fnbind`/`fncreate -s`

`fndestroy`/`fnunbind` Does Not Return `Operation Failed`