A cluster file system is a proxy between the kernel on one node and the underlying file system and volume manager running on a node that has a physical connection to the disk(s).
Cluster file systems are dependent on global devices (disks, tapes, CD-ROMs) with physical connections to one or more nodes. The global devices can be accessed from any node in the cluster through the same file name (for example, /dev/global/) whether or not that node has a physical connection to the storage device. You can use a global device the same as a regular device, that is, you can create a file system on it using newfs and/or mkfs.
You can mount a file system on a global device globally with mount -g or locally with mount.
Programs can access a file in a cluster file system from any node in the cluster through the same file name (for example, /global/foo).
A cluster file system is mounted on all cluster members. You cannot mount a cluster file system on a subset of cluster members.
A cluster file system is not a distinct file system type. That is, clients see the underlying file system (for example, UFS).
In the SunPlex system, all multihost disks are placed into disk device groups, which can be Solaris Volume Manager disksets, VxVM disk groups, or individual disks not under control of a software-based volume manager.
For a cluster file system to be highly available, the underlying disk storage must be connected to more than one node. Therefore, a local file system (a file system that is stored on a node's local disk) that is made into a cluster file system is not highly available.
As with normal file systems, you can mount cluster file systems in two ways:
Manually: Use the mount command and the -g or -o global mount options to mount the cluster file system from the command line, for example:
# mount -g /dev/global/dsk/d0s0 /global/oracle/data |
Automatically: Create an entry in the /etc/vfstab file with a global mount option to mount the cluster file system at boot. You then create a mount point under the /global directory on all nodes. The directory /global is a recommended location, not a requirement. Here's a sample line for a cluster file system from an /etc/vfstab file:
/dev/md/oracle/dsk/d1 /dev/md/oracle/rdsk/d1 /global/oracle/data ufs 2 yes global,logging |
While Sun Cluster software does not impose a naming policy for cluster file systems, you can ease administration by creating a mount point for all cluster file systems under the same directory, such as /global/disk-device-group. See Sun Cluster 3.1 Software Installation Guide and Sun Cluster 3.1 System Administration Guide for more information.
The cluster file system has the following features:
File access locations are transparent. A process can open a file located anywhere in the system and processes on all nodes can use the same path name to locate a file.
When the cluster file system reads files, it does not update the access time on those files.
Coherency protocols are used to preserve the UNIX file access semantics even if the file is accessed concurrently from multiple nodes.
Extensive caching is used along with zero-copy bulk I/O movement to move file data efficiently.
The cluster file system provides highly available advisory file locking functionality using the fcntl(2) interfaces. Applications running on multiple cluster nodes can synchronize access to data using advisory file locking on a cluster file system file. File locks are recovered immediately from nodes that leave the cluster, and from applications that fail while holding locks.
Continuous access to data is ensured, even when failures occur. Applications are not affected by failures as long as a path to disks is still operational. This guarantee is maintained for raw disk access and all file system operations.
Cluster file systems are independent from the underlying file system and volume management software. Cluster file systems make any supported on-disk file system global.
The HAStoragePlus resource type is designed to make non-global file system configurations such as UFS and VxFS highly available. Use HAStoragePlus to integrate your local file system into the Sun Cluster environment and make the file system highly available. HAStoragePlus provides additional file system capabilities such as checks, mounts, and forced unmounts that enable Sun Cluster to fail over local file systems. In order to fail over, the local file system must reside on global disk groups with affinity switchovers enabled.
See the individual data service chapters or the Sun Cluster 3.1 Data Service Planning and Administration Guide in the Sun Cluster 3.1 Data Service Collection for information on how to use the HAStoragePlus resource type.
HAStoragePlus can also used to synchronize the startup of resources and disk device groups upon which the resources depend. For more information, see Resources, Resource Groups, and Resource Types.
The syncdir mount option can be used for cluster file systems that use UFS as the underlying file system. However, there is a significant performance improvement if you do not specify syncdir. If you specify syncdir, the writes are guaranteed to be POSIX compliant. If you do not, you will have the same behavior that is seen with NFS file systems. For example, under some cases, without syncdir, you would not discover an out of space condition until you close a file. With syncdir (and POSIX behavior), the out of space condition would have been discovered during the write operation. The cases in which you could have problems if you do not specify syncdir are rare, so we recommend that you do not specify it and receive the performance benefit.
VxFS does not have a mount-option equivalent to the syncdir mount option for UFS. VxFS behavior is the same as for UFS when the syncdir mount option is not specified.
See File Systems FAQs for frequently asked questions about global devices and cluster file systems.