Sun Cluster 3.1 Data Service for Network File System (NFS)

Installing and Configuring Sun Cluster HA for Network File System (NFS)

This chapter describes the steps to install and configure Sun Cluster HA for Network File System (NFS) on your Sun Cluster nodes.

This chapter contains the following procedures.

You must configure Sun Cluster HA for NFS as a failover data service. See “Planning for Sun Cluster Data Services” in Sun Cluster 3.1 Data Service Planning and Administration Guide and the Sun Cluster 3.1 Concepts Guide document for general information about data services, resource groups, resources, and other related topics.


Note –

You can use SunPlex Manager to install and configure this data service. See the SunPlex Manager online help for details.


Use the worksheets in Sun Cluster 3.1 Release Notes to plan your resources and resource groups before you install and configure Sun Cluster HA for NFS.

The NFS mount points that are placed under the control of the data service must be the same on all of the nodes that can master the disk device group that contains those file systems.


Caution – Caution –

If you use VERITAS Volume Manager, you can avoid “stale file handle” errors on the client during NFS failover. Ensure that the vxio driver has identical pseudo-device major numbers on all of the cluster nodes. You can find this number in the /etc/name_to_major file after you complete the installation.


Installing and Configuring Sun Cluster HA for NFS

The following table lists the sections that describe the installation and configuration tasks.

Table 1–1 Task Map: Installing and Configuring Sun Cluster HA for NFS

Task 

For Instructions 

Install Sun Cluster HA for NFS packages 

Installing Sun Cluster HA for NFS Packages

Set up and configure Sun Cluster HA for NFS 

Registering and Configuring Sun Cluster HA for NFS

Configure resource extension properties 

Configuring Sun Cluster HA for NFS Extension Properties

View fault monitor information 

Sun Cluster HA for NFS Fault Monitor

Installing Sun Cluster HA for NFS Packages

Use the scinstall(1M) utility to install the data service package, SUNWscnfs, on the cluster.

If you installed the SUNWscnfs data service package during your initial Sun Cluster installation, proceed to Registering and Configuring Sun Cluster HA for NFS. Otherwise, use the following procedure to install the SUNWscnfs package.

How to Install Sun Cluster HA for NFS Packages

You need the Sun Cluster Agents CD-ROM to complete this procedure. Perform this procedure on all of the cluster nodes that can run Sun Cluster HA for NFS.

  1. Load the Sun Cluster Agents CD-ROM into the CD-ROM drive.

  2. Run the scinstall utility with no options.

    This step starts the scinstall utility in interactive mode.

  3. Choose the menu option, Add Support for New Data Service to This Cluster Node.

    The scinstall utility prompts you for additional information.

  4. Provide the path to the Sun Cluster Agents CD-ROM.

    The utility refers to the CD as the “data services cd.”

  5. Specify the data service to install.

    The scinstall utility lists the data service that you selected and asks you to confirm your choice.

  6. Exit the scinstall utility.

  7. Unload the CD from the drive.

Where to Go From Here

See Registering and Configuring Sun Cluster HA for NFS to register Sun Cluster HA for NFS and to configure the cluster for the data service.

Registering and Configuring Sun Cluster HA for NFS

This procedure describes how to use the scrgadm(1M) command to register and configure Sun Cluster HA for NFS.


Note –

Other options also enable you to register and configure the data service. See “Tools for Data Service Resource Administration” in Sun Cluster 3.1 Data Service Planning and Administration Guide for details about these options.


Before you register and configure Sun Cluster HA for NFS, run the following command to verify that the Sun Cluster HA for NFS package, SUNWscnfs, is installed on the cluster.


# pkginfo -l SUNWscnfs

If the package has not been installed, see Installing Sun Cluster HA for NFS Packages for instructions on how to install the package.

How to Register and Configure Sun Cluster HA for NFS

  1. Become superuser on a cluster member.

  2. Verify that all of the cluster nodes are online.


    # scstat -n
    
  3. Create the Pathprefix directory.

    The Pathprefix directory exists on the cluster file system that Sun Cluster HA for NFS uses to maintain administrative and status information.

    You can specify any directory for this purpose. However, you must manually create an admin-dir directory for each resource group that you create. For example, create the directory /global/nfs.


    # mkdir -p /global/admin-dir
    
  4. Create a failover resource group to contain the NFS resources.


    # scrgadm -a -g resource-group -y Pathprefix=/global/admin-dir [-h nodelist]
    -a

    Specifies that you are adding a new configuration.

    -g resource-group

    Specifies the failover resource group.

    -y Pathprefix=path

    Specifies a directory on a cluster file system that Sun Cluster HA for NFS uses to maintain administrative and status information.

    [-h nodelist]

    Specifies an optional, comma-separated list of physical node names or IDs that identify potential masters. The order here determines the order in which the Resource Group Manager (RGM) considers primary nodes during failover.

  5. Verify that you have added all of your logical hostname resources to the name service database.

    To avoid any failures because of name service lookup, verify that all of the logical hostnames are present in the server's and client's /etc/inet/hosts file.

  6. Configure name service mapping in the /etc/nsswitch.conf file on the cluster nodes to first check the local files before trying to access NIS or NIS+ for rpc lookups.

    Doing so prevents timing-related errors for rpc lookups during periods of public network or nameservice unavailability.

  7. Modify the hosts entry in /etc/nsswitch.conf so that upon resolving a name locally the host does not first contact NIS/DNS, but instead immediately returns a successful status.

    Doing this enables HA-NFS to failover correctly in the presence of public network failures.


    # hosts: cluster files [SUCCESS=return] nis
    # rpc: files nis
    
  8. (Optional) Customize the nfsd or lockd startup options.

    1. To customize nfsd options, on each cluster node open the /etc/init.d/nfs.server file, find the command line starting with /usr/lib/nfs/nfsd, and add any additional arguments desired.

    2. To customize lockd startup options, on each cluster node open the /etc/init.d/nfs.client file, find the command line starting with /usr/lib/nfs/lockd, and add any command line arguments desired.


    Note –

    The command lines must remain limited to a single line. Breaking the command line into multiple lines is not supported. The additional arguments must be valid options documented in man pages nfsd(1M) and lockd(1M).


  9. Add the desired logical hostname resources into the failover resource group.

    You must set up a logical hostname resource with this step. The logical hostname that you use with Sun Cluster HA for NFS cannot be a SharedAddress resource type.


    # scrgadm -a -L -g resource-group -l logical-hostname, … [-n netiflist]
    -a

    Specifies that you are adding a new configuration.

    -L -g resource-group

    Specifies the resource group that is to hold the logical hostname resources.

    -l logical-hostname, …

    Specifies the logical hostname resource to be added.

    -n netiflist

    Specifies an optional, comma-separated list that identifies the IP Networking Multipathing groups that are on each node. Each element in netiflist must be in the form of netif@node. netif can be given as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.


    Note –

    Sun Cluster does not currently support using the adapter name for netif.


  10. From any cluster node, create a directory structure for the NFS configuration files.

    Create the administrative subdirectory below the directory that the Pathprefix property identifies in Step 4, for example, /global/nfs/SUNW.nfs.


    # mkdir Pathprefix/SUNW.nfs
  11. Create a dfstab.resource file in the SUNW.nfs directory that you created in Step 10, and set up share options.

    1. Create the Pathprefix/SUNW.nfs/dfstab.resource file.

      This file contains a set of share commands with the shared path names. The shared paths should be subdirectories on a cluster file system.


      Note –

      Choose a resource name suffix to identify the NFS resource that you plan to create (in Step 13). A good resource name refers to the task that this resource is expected to perform. For example, a name such as user-nfs-home is a good candidate for an NFS resource that shares user home directories.


    2. Set up the share options for each path that you have created to be shared.

      The format of this file is exactly the same as the format that is used in the /etc/dfs/dfstab file.


      share [-F nfs] [-o] specific_options [-d “description”] pathname 
      
      -F nfs

      Identifies the file system type as nfs.

      -o specific_options

      Grants read-write access to all of the clients. See the share(1M) man page for a list of options. Set the rw option for Sun Cluster.

      -d description

      Describes the file system to add.

      pathname

      Identifies the file system to share.

    When you set up your share options, consider the following points.

    • When constructing share options, do not use the root option, and do not mix the ro and rw options.

    • Do not grant access to the hostnames on the cluster interconnect.

      Grant read and write access to all of the cluster nodes and logical hosts to enable the Sun Cluster HA for NFS monitoring to do a thorough job. However, you can restrict write access to the file system or make the file system entirely read-only. If you do so, Sun Cluster HA for NFS fault monitoring can still perform monitoring without having write access.

    • If you specify a client list in the share command, include all of the physical hostnames and logical hostnames that are associated with the cluster, as well as the hostnames for all of the clients on all of the public networks to which the cluster is connected.

    • If you use net groups in the share command (rather than names of individual hosts), add all of those cluster hostnames to the appropriate net group.

    The share -o rw command grants write access to all of the clients, including the hostnames that the Sun Cluster software uses. This command enables Sun Cluster HA for NFS fault monitoring to operate most efficiently. See the following man pages for details.

    • dfstab(4)

    • share(1M)

    • share_nfs(1M)

  12. Register the NFS resource type.


    # scrgadm -a -t resource-type
    
    -a -t resource-type

    Adds the specified resource type. For Sun Cluster HA for NFS, the resource type is SUNW.nfs.

  13. Create the NFS resource in the failover resource group.


    # scrgadm -a -j resource -g resource-group -t resource-type
    
    -a

    Specifies that you are adding a configuration.

    -j resource

    Specifies the name of the resource to add, which you defined in Step 11. This name can be your choice but must be unique within the cluster.

    -g resource-group

    Specifies the name of a previously created resource group to which this resource is to be added.

    -t resource-type

    Specifies the name of the resource type to which this resource belongs. This name must be the name of a registered resource type.

  14. Run the scswitch(1M) command to perform the following tasks.

    • Enable the resource and the resource monitor.

    • Manage the resource group.

    • Switch the resource group into the ONLINE state.


    # scswitch -Z -g resource-group
    

Example – Setting Up and Configuring Sun Cluster HA for NFS

The following example shows how to set up and configure Sun Cluster HA for NFS.


(Create a logical host resource group and specify the path to the administrative 
files used by NFS (Pathprefix).)
# scrgadm -a -g resource-group-1 -y Pathprefix=/global/nfs
 
(Add logical hostname resources into the logical host resource group.)
# scrgadm -a -L -g resource-group-1 -l schost-1
 
(Make the directory structure contain the Sun Cluster HA for NFS configuration 
files.)
# mkdir -p /global/nfs/SUNW.nfs
 
(Create the dfstab.resource file under the nfs/SUNW.nfs directory and set share 
options.)
# share -F nfs -o rw=engineering -d “home dirs” nfs/SUNW.nfs
 
(Register the NFS resource type.)
# scrgadm -a -t SUNW.nfs
 
(Create the NFS resource in the resource group.)
# sc
rgadm -a -j r-nfs -g resource-group-1 -t SUNW.nfs
 
(Enable the resources and their monitors, manage the resource group, and switch 
the resource group into online state.)
# scswitch -Z -g resource-group-1

Where to Go From Here

See How to Change Share Options on an NFS File System to set share options for your NFS file systems. See Configuring Sun Cluster HA for NFS Extension Properties to review or set extension properties.

How to Change Share Options on an NFS File System

If you use the rw, rw=, ro, or ro= options to the share -o command, NFS fault monitoring works best if you grant access to all of the physical hosts or netgroups that are associated with all of the Sun Cluster servers.

If you use netgroups in the share(1M) command, add all of the Sun Cluster hostnames to the appropriate netgroup. Ideally, grant both read access and write access to all of the Sun Cluster hostnames to enable the NFS fault probes to do a complete job.


Note –

Before you change share options, read the share_nfs(1M) man page to understand which combinations of options are legal.


  1. Become superuser on a cluster node.

  2. Turn off fault monitoring on the NFS resource.


    # scswitch -n -M -j resource
    
    -M

    Disables the resource monitor

  3. Test the new share options.

    1. Before you edit the dfstab.resource file with new share options, execute the new share command to verify the validity of your combination of options.


      # share -F nfs [-o] specific_options [-d “description”] pathname
      
      -F nfs

      Identifies the file system type as NFS.

      -o specific_options

      Specifies an option. You might use rw, which grants read-write access to all of the clients.

      -d description

      Describes the file system to add.

      pathname

      Identifies the file system to share.

    2. If the new share command fails, immediately execute another share command with the old options. When the new command executes successfully, proceed to Step 4.

  4. Edit the dfstab.resource file with the new share options.

    1. To remove a path from the dfstab.resource file, perform the following steps in order.

      1. Execute the unshare(1M) command.

      2. From the dfstab.resource file, delete the share command for the path that you want to remove.


      # unshare [-F nfs] [-o specific_options] pathname
      # vi dfstab.resource
      
      -F nfs

      Identifies the file system type as NFS.

      -o specific_options

      Specifies the options that are specific to NFS file systems.

      pathname

      Identifies the file system that is made unavailable.

    2. To add a path or change an existing path in the dfstab.resource file, verify that the mount point is valid, then edit the dfstab.resource file.


    Note –

    The format of this file is exactly the same as the format that is used in the /etc/dfs/dfstab file. Each line consists of a share command.


  5. Enable fault monitoring on the NFS resource.


    # scswitch -e -M -j resource
    

How to Tune Sun Cluster HA for NFS Method Timeouts

The time that Sun Cluster HA for NFS methods require to finish depends on the number of paths that the resources share through the dfstab.resource file. The default timeout for these methods is 300 seconds.

As a general guideline, allocate 10 seconds toward the method timeouts for each path that is shared. Default timeouts are designed to handle 30 shared paths.

Update the following method timeouts if the number of shared paths is greater than 30.

Prenet_start_timeoutPostnet_stop_timeoutMonitor_Start_timeout
Start_timeoutValidate_timeoutMonitor_Stop_timeout
Stop_timeoutUpdate_timeoutMonitor_Check_timeout

To change method timeouts, use the scrgadm -c option, as in the following example.


% scrgadm -c -j resource -y Prenet_start_timeout=500

How to Configure SUNW.HAStoragePlus Resource Type

The SUNW.HAStoragePlus resource type was introduced in Sun Cluster 3.0 5/02. This new resource type performs the same functions as SUNW.HAStorage, and synchronizes actions between HA storage and Sun Cluster HA for NFS. SUNW.HAStoragePlus also has an additional feature to make a local file system highly available. Sun Cluster HA for NFS is both failover and disk-intensive, and therefore, you should set up the SUNW.HAStoragePlus resource type.

See the SUNW.HAStoragePlus(5) man page and “Relationship Between Resource Groups and Disk Device Groups” in Sun Cluster 3.1 Data Service Planning and Administration Guide for background information. See “Synchronizing the Startups Between Resource Groups and Disk Device Groups” in Sun Cluster 3.1 Data Service Planning and Administration Guide for the procedure. (If you are using a Sun Cluster 3.0 version prior to 5/02, you must set up SUNW.HAStorage instead of SUNW.HAStoragePlus. See “Synchronizing the Startups Between Resource Groups and Disk Device Groups” in Sun Cluster 3.1 Data Service Planning and Administration Guide for the procedure.)

Configuring Sun Cluster HA for NFS Extension Properties

Typically, you use the command line scrgadm -x parameter=value to configure extension properties when you create the NFS resource. You can also use the procedures in “Administering Data Service Resources” in Sun Cluster 3.1 Data Service Planning and Administration Guide to configure these properties later. You do not need to set any extension properties for Sun Cluster HA for NFS. See “Standard Properties” in Sun Cluster 3.1 Data Service Planning and Administration Guide for details on all of the Sun Cluster properties.

Table 1–2 describes extension properties that you can configure for Sun Cluster HA for NFS. You can update some properties dynamically. You can update others, however, only when you create the resource. The Tunable entries indicate when you can update the property.

Table 1–2 Sun Cluster HA for NFS Extension Properties

Name/Data Type 

Default 

Lockd_nullrpc_timeout (integer)

The time-out value (in seconds) to use when probing lockd.

 

Default: 120

Range: Minimum = 60

Tunable: Any time

Monitor_retry_count (integer)

The number of times that the process monitor facility (PMF) restarts the fault monitor during the time window that the Monitor_retry_interval property specifies. Note that this property refers to restarts of the fault monitor itself, rather than the resource. The system-defined properties Retry_interval and Retry_count control restarts of the resource. See the scrgadm(1M) man page for a description of these properties.

 

Default: 4

Range: 02,147,483,641

–1 indicates an infinite number of restart attempts.

Tunable: Any time

Monitor_retry_interval (integer)

The time (in minutes) over which failures of the fault monitor are counted. If the number of times that the fault monitor fails is more than the value that is specified in the extension property Monitor_retry_count within this period, the PMF restarts the fault monitor.

 

Default: 2

Range: 02,147,483,641

–1 indicates an infinite amount of time.

Tunable: Any time

Mountd_nullrpc_restart (Boolean)

A Boolean to indicate whether to restart mountd when a null rpc call fails.

 

Default: True

Range: None

Tunable: Any time

Mountd_nullrpc_timeout (integer)

The time-out value (in seconds) to use when probing mountd.

 

Default: 120

Range: Minimum = 60

Tunable: Any time

Nfsd_nullrpc_restart (Boolean)

A Boolean to indicate whether to restart nfsd when a null rpc call fails.

 

Default: False

Range: None

Tunable: Any time

Nfsd_nullrpc_timeout (integer)

The time-out value (in seconds) to use when probing nfsd.

 

Default: 120

Range: Minimum = 60

Tunable: Any time

Rpcbind_nullrpc_reboot (Boolean)

A Boolean to indicate whether to reboot the system when a null rpc call on rpcbind fails.

 

Default: False

Range: None

Tunable: Any time

Rpcbind_nullrpc_timeout (integer)

The time-out value (in seconds) to use when probing rpcbind.

 

Default: 120

Range: Minimum = 60

Tunable: Any time

Statd_nullrpc_timeout (integer)

The time-out value (in seconds) to use when probing statd.

 

Default: 120

Range: Minimum = 60

Tunable: Any time

Sun Cluster HA for NFS Fault Monitor

The Sun Cluster HA for NFS fault monitor uses the following two processes.

Fault Monitor Startup

An NFS resource MONITOR_START method starts the NFS system fault monitor. This start method first checks if the NFS system fault monitor (nfs_daemons_probe) already runs under the process monitor pmfadm. If the NFS system fault monitor is not running, the start method starts the nfs_daemons_probe process under the control of the process monitor. The start method then starts the resource fault monitor (nfs_probe), also under the control of the process monitor.

Fault Monitor Stops

The NFS resource MONITOR_STOP method stops the resource fault monitor. This method also stops the NFS system fault monitor if no other NFS resource fault monitor runs on the local node.

NFS Fault Monitor Process

To check for the presence of the process and its response to a null rpc call, the system fault monitor probes rpcbind, statd, lockd, nfsd, and mountd. This monitor uses the following NFS extension properties.

Rpcbind_nullrpc_timeoutLockd_nullrpc_timeout
Nfsd_nullrpc_timeoutRpcbind_nullrpc_reboot
Mountd_nullrpc_timeoutNfsd_nullrpc_restart

Statd_nullrpc_timeout

Mountd_nullrpc_restart

See Configuring Sun Cluster HA for NFS Extension Properties to review or set extension properties.

If a daemon needs to be stopped, the calling method sends a kill signal to the process id (PID) and waits to verify that the PID disappears. The amount of time that the calling method waits is a fraction of the method's timeouts. If the process does not stop within that period of time, the fault monitor assumes that the process failed.


Note –

If the process needs more time to stop, increase the timeout of the method that was running when the process was sent the kill signal.


After the daemon is stopped, the fault monitor restarts the daemon and waits until the daemon is registered under RPC. If a new RPC handle can be created, the status of the daemon is reported in the fault monitor internally as a success. If the RPC handle cannot be created, the status of the daemon is returned to the fault monitor as unknown, and no error messages are printed.

Each system fault-monitor probe cycle performs the following steps in a loop.

  1. Sleeps for Cheap_probe_interval.

  2. Probes rpcbind.

    If the process fails and Failover_mode=HARD, then the fault monitor reboots the node.

    If a null rpc call to the daemon fails, Rpcbind_nullrpc_reboot=False, and Failover_mode=HARD, then the fault monitor reboots the node.

  3. Probes statd and lockd.

    If statd or lockd fail, the fault monitor attempts to restart both daemons. If the fault monitor cannot restart the daemons, all of the NFS resources fail over to another node.

    If a null rpc call to these daemons fails, the fault monitor logs a message to syslog but does not restart statd or lockd.

  4. Probe mountd.

    If mountd fails, the fault monitor attempts to restart the daemon.

    If the kstat counter, nfs_server:calls, is not increasing, the following actions occur.

    1. A null rpc call is sent to mountd.

    2. If the null rpc call fails and Mountd_nullrpc_restart=True, the fault monitor attempts to restart mountd if the cluster file system is available.

    3. If the fault monitor cannot restart mountd and the number of failures reaches Retry_count, then all of the NFS resources fail over to another node.

  5. Probe nfsd.

    If nfsd fails, the fault monitor attempts to restart the daemon.

    If the kstat counter, nfs_server:calls, is not increasing, the following actions occur.

    1. A null rpc call is sent to nfsd.

    2. If the null rpc call fails and Nfsd_nullrpc_restart=TRUE, then the fault monitor attempts to restart nfsd.

    3. If the fault monitor cannot restart nfsd and the number of failures reaches Retry_count, then all of the NFS resources fail over to another node.

If any of the NFS daemons fail to restart, the status of all of the online NFS resources is set to FAULTED. When all of the NFS daemons are restarted and healthy, the resource status is set to ONLINE again.

NFS Resource Monitor Process

Before starting the resource monitor probes, all of the shared paths are read from the dfstab file and stored in memory. In each probe cycle, all of the shared paths are probed in each iteration by performing stat() on the path.

Each resource monitor fault probe performs the following steps.

  1. Sleeps for Thorough_probe_interval.

  2. Refreshes the memory if dfstab has been changed since the last read.

  3. Probes all of the shared paths in each iteration by performing stat() of the path.

If any path is not functional, the resource status is set to FAULTED. If all of the paths are functional, the resource status is set to ONLINE again.