WebNFS Developer's Guide

Reliability

More than any other distributed file system protocol, the NFS protocol is known for its reliability and data safety. The NFS version 2 protocol was notorious for slow write speed. The NFS guarantee of data safety required the server to store the data from each write request to disk before replying to the client. Although the "synchronous write" requirement imposed a performance trade-off, the client was assured that a server crash would never lose its data. An NFS server crash or network outage will never result in corrupted or half-written files. Version 3 preserves the data safety guarantee while allowing the server to improve write performance through asynchronous writes. The NFS client uses the improved write technique of version 3 to deliver excellent write performance with data safety.

If an NFS server crashes or the network connection is lost, the NFS client will persist in attempting to restore the connection and continue where it left off. If a TCP connection is broken for any reason, the client will re-establish the connection. If a UDP request or reply is lost, the client will re-transmit the request until the operation succeeds using an exponential backoff on the timeout to avoid overloading the server with retransmissions. Since NFS servers are designed to be stateless, the server need do nothing on recovery other than serve new NFS requests that may include retransmitted requests that were not completed before the crash.

This reliability has tangible benefits for users who are used to the low bandwidth access to busy Internet servers. With protocols like HTTP or FTP a lost connection usually means that a file transfer must begin over and applications that use these protocols will receive an error. These problems are transparent to applications that use the NFS classes through the extended file system. Any pending read or write will block until it completes successfully. In the case of a file transfer over NFS it means that the file transfer will resume automatically from where it left off. This blocking behavior and persistence need not be inconvenient to an interactive user with limited patience; a Java application can implement a "stop" or "abort" button that kills a blocked thread.

In short, the NFS protocol and the NFS client implementation of it are very reliable.