Client Recovery in NFS Version 4

The NFS Version 4 protocol is a stateful protocol. Both the NFS client and the NFS server maintain current information about the open files and file locks.

In a typical NFS server implementation, a server state can be lost due to NFS service restarts or server reboots. The NFS client has to go through a period of 90 seconds during which it recovers its state. This period is known as the grace period. During the grace period, the NFS client sends information about its previous state to the NFS server to recover its lost state. This process is known as Client Recovery, as the NFS client recovers the lost state at the NFS server. During this time, any new requests to open files or set file locks must wait for the grace period to complete so that all NFS clients can recover their state successfully. The recovery is transparent to the NFS applications on the NFS client.

Note:

NFS Version 4.1 clients when working with NFS Version 4.1 servers can terminate the need for a grace period. For more information, see reclaim_complete Feature.

When the recovery process starts, the following message in the system error log, /var/adm/messages, is displayed:

NOTICE: Starting recovery server server-name

When the client recovery process is complete, the following message is displayed in the system error log, /var/adm/messages:

NOTICE: Recovery done for server server-name

At this point, the NFS client has successfully completed sending its state information to the NFS server.

During the grace period, if the client attempts to open any new files or establish any new locks, the server denies the request with the NFS4ERR_GRACE error code. Upon receiving this error, the client must wait for the grace period to end and then resend the request to the server. During the grace period, the following message is displayed:

NFS server recovering

During the grace period, the commands that do not open files or set file locks can proceed. For example, the commands ls and cd do not open a file or set a file lock, these commands are not suspended. However, a command such as cat, which opens a file, would be suspended until the grace period ends.

When the grace period has ended, the following message is displayed:

NFS server recovery ok.

The client can now send new open and lock requests to the server.

Client recovery can fail for a variety of reasons. For example, if a network partition exists after the server reboots, the client might not be able to re-establish its state with the server before the grace period ends. When the grace period has ended, the server does not permit the client to re-establish its state because new state operations could create conflicts. For example, a new file lock might conflict with an old file lock that the client is trying to recover. When such situations occur, the server returns the NFS4ERR_NO_GRACE error code to the client.

If the recovery of an open operation for a file fails, the client marks the file as unusable and the following message is displayed:

WARNING: The following NFS file could not be recovered and was marked dead 
(can't reopen:  NFS status n):  file :  filename

If re-establishing a file lock during recovery fails, the following error message is displayed:

NOTICE: nfs4_send_siglost:  pid process-ID lost
lock on server server-name

In this situation, the SIGLOST signal is posted to the process. The default action for the SIGLOST signal is to terminate the process.

To recover from this state, you must restart any applications that had files open at the time of the failure. Some processes that did not reopen the file could receive I/O errors. Other processes that did reopen the file, or performed the open operation after the recovery failure, can access the file without any problems.

Thus, some processes can access a particular file while other processes cannot.