NFS Administration Guide

Client-Side Failover

Using client-side failover, an NFS client can switch to another server if the server supporting a replicated file system becomes unavailable. The file system can become unavailable if the server it is connected to crashes, if the server is overloaded or if there is a network fault. The failover, under these conditions, will normally be transparent to the user. Once established the failover can occur at any time without disrupting the processes running on the client.

Failover requires that the file system be mounted read-only. The file systems must be identical for the failover to occur successfully. See "What Is a Replicated File System?" for a description of what makes a file system identical. A static file system or one that is not changed often is the best candidate for failover.

You can not use file systems that are mounted using cachefs with failover. Extra information is stored for each cachefs file system. This information can not be updated during failover, so only one of these two features may be used when mounting a file system.

The number of replicas that need to be established for each file system depends on many factors. In general, it is better to have a couple of servers, each supporting multiple subnets rather than have a unique server on each subnet. The process requires checking of each server in the list, so the more servers that are listed the slower each mount will be.

Failover Terminology

To fully comprehend the process, two terms need to be understood.

failover - Selecting a server from a list of servers supporting a replicated file system. Normally, the next server in the sorted list is used, unless it fails to respond.
remap - Making use of a new server. Through normal use, the clients store the path name for each active file on the remote file system. During the remap, these path names are evaluated to locate the files on the new server.

What Is a Replicated File System?

For the purposes of failover, a file system may be called a replica when each file is the same size and has the same vnode type as the original file system. Permissions, creation dates, and other file attributes are not considered. If the file size or vnode types are different, then the remap will fail and the process will hang until the old server becomes available.

You can maintain a replicated file system using rdist, cpio, or other file transfer mechanisms. Because updating the replicated file systems will cause inconsistency, follow these suggestions for best results:

Rename the old version of the file before installing a new one.
Run the updates at night when client usage is low.
Keep the updates small.
Minimize the number of copies.

Failover and NFS Locking

Some software packages require read locks on files. To prevent these products from breaking, read locks on read-only file systems are allowed, but are visible to the client side only. The locks will persist through a remap because the server doesn't "know" about them. Because the files should not be changing, you do not need to lock the file on the server side.