C H A P T E R  7

Advanced Topics

This chapter discusses advanced topics that are beyond the scope of basic system administration and usage. This chapter contains the following sections:


Using Daemons, Processes, and Tracing

It is useful to have an understanding of system daemons and processes when you are debugging. This section describes the Sun StorEdge QFS daemons and processes. It also provides information about daemon tracing.

Daemons and Processes

All Sun StorEdge QFS daemons are named in the form sam-daemon_named. Processes are named in a similar manner; the difference is that they do not end in the lowercase letter d.

TABLE 7-1 shows some of the daemons and processes that can run on your system. Others, such as sam-genericd and sam-catserverd, might also be running, depending on system activities.


TABLE 7-1 Daemons and Processes

Process

Description

sam-fsd

Master daemon.

sam-sharefsd

Invokes the Sun StorEdge QFS shared file system daemon.

sam-rpcd

Controls the remote procedure call (RPC) application programming interface (API) server process.


When you run Sun StorEdge QFS software, init starts the sam-fsd daemon as part of /etc/inittab processing. The daemon is started at init levels 0, 2, 3, 4, 5, and 6. It should restart automatically in case of failure.

In a Sun StorEdge QFS shared file system, a sam-fsd daemon is always active. In addition, one sam-sharefsd daemon is active for each mounted shared file system.

When a sam-fsd daemon recognizes a Sun StorEdge QFS shared file system, it starts a shared file system daemon (sam-sharefsd). TCP sockets are used to communicate between the server and client hosts. All clients that connect to the metadata server are validated against the hosts file.



Note - See the hosts.fs man page for more information about the hosts file.



The sam-sharedfsd daemon on the metadata server opens a listener socket on the port named sam-qfs. During the Sun StorEdge QFS installation process, the sam-qfs entry is automatically added to /etc/services file. Do not remove this entry. In addition, the shared file system port is defined in the /etc/inet/services file as port number 7105. Verify that this port does not conflict with another service.



Note - Before the Sun StorEdge QFS 4U2 release, one port per file system was required. You can remove these entries from your file.



All metadata operations, block allocation and deallocation, and record locking are performed on the metadata server. The sam-sharefsd daemon does not keep any information. Hence, it can be stopped and restarted without causing any consistency problems for the file system.

Trace Files

Several Sun StorEdge QFS processes can write messages to trace files. These messages contain information about the state and progress of the work performed by the daemons. The messages are primarily used by Sun Microsystems staff members to improve performance and diagnose problems. The message content and format are subject to change from release to release.

Trace files can be used in debugging. By default, trace files are not enabled. You can enable trace files by editing the defaults.conf file. You can enable tracing for all processes, or you can enable tracing for individual processes. For information about the processes that you can trace, see the defaults.conf(4) man page.

By default, trace files are written to the /var/opt/SUNWsamfs/trace directory. In that directory, the trace files are named for the processes (archiver, catserver, fsd, ftpd, recycler, sharefsd, and stager). You can change the names of the trace files by specifying directives in the defaults.conf configuration file. You can also set a limit on the size of a trace file and rotate your tracing logs. For information about controlling tracing, see the defaults.conf(4) man page.

Trace File Content

Trace file messages contain the time and source of the message. The messages are produced by events in the processes. You can select the events by using directives in the defaults.conf file.

The default events are as follows:

You can also trace the following events:

The default message elements (program name, process id (PID), and time) are always included and cannot be excluded. Optionally, the messages can also contain the following elements:

Trace File Rotation

To prevent trace files from growing indefinitely, the sam-fsd daemon monitors the size of the trace files and periodically executes the following command:


/opt/SUNWsamfs/sbin/trace_rotate

This script moves the trace files to sequentially numbered copies. You can modify this script to suit your operation. Alternatively, you can provide this function using cron(1) or some other facility.

Determining Which Processes Are Being Traced

To determine which processes are being traced currently, enter the sam-fsd(1M) command at the command line. CODE EXAMPLE 7-1 shows the output from this command.


CODE EXAMPLE 7-1 sam-fsd (1M) Command Output
# sam-fsd
Trace file controls:
sam-amld      /var/opt/SUNWsamfs/trace/sam-amld
              cust err fatal misc proc date
              size    0    age 0
sam-archiverd /var/opt/SUNWsamfs/trace/sam-archiverd
              cust err fatal misc proc date
              size    0    age 0
sam-catserverd /var/opt/SUNWsamfs/trace/sam-catserverd
              cust err fatal misc proc date
              size    0    age 0
sam-fsd       /var/opt/SUNWsamfs/trace/sam-fsd
              cust err fatal misc proc date
              size    0    age 0
sam-rftd      /var/opt/SUNWsamfs/trace/sam-rftd
              cust err fatal misc proc date
              size    0    age 0
sam-recycler  /var/opt/SUNWsamfs/trace/sam-recycler
              cust err fatal misc proc date
              size    0    age 0
sam-sharefsd  /var/opt/SUNWsamfs/trace/sam-sharefsd
              cust err fatal misc proc date
              size    0    age 0
sam-stagerd   /var/opt/SUNWsamfs/trace/sam-stagerd
              cust err fatal misc proc date
              size    0    age 0
sam-serverd   /var/opt/SUNWsamfs/trace/sam-serverd
              cust err fatal misc proc date
              size    0    age 0
sam-clientd   /var/opt/SUNWsamfs/trace/sam-clientd
              cust err fatal misc proc date
              size    0    age 0
sam-mgmt      /var/opt/SUNWsamfs/trace/sam-mgmt
              cust err fatal misc proc date
              size    0    age 0

For more information about enabling trace files, see the defaults.conf(4) man page and the sam-fsd(1M) man page.


Using the setfa(1) Command to Set File Attributes

Sun StorEdge QFS file systems enable end users to set performance attributes for files and directories. Applications can enable these performance features on a per-file or per-directory basis. The following sections describe how the application programmer can use these features to select file attributes for files and directories, to preallocate file space, to specify the allocation method for the file, and to specify the disk stripe width.

For more information about implementing the features described in the following subsections, see the setfa(1) man page.

Selecting File Attributes for Files and Directories

The setfa(1) command sets attributes on a new or existing file. The file is created if it does not already exist.

You can set attributes on a directory as well as a file. When using setfa(1) with a directory, files and directories created within that directory inherit the attributes set in the original directory. To reset attributes on a file or directory to the default, use the -d (default) option. When the -d option is used, first attributes are reset to the default and then other attributes are processed.

Preallocating File Space

An end user can preallocate space for a file. This space is associated with a file so that no other files in the file system can use the disk addresses allocated to this file. Preallocation ensures that space is available for a given file, which avoids a file-system-full condition. Preallocation is assigned at the time of the request rather than when the data is actually written to disk.

Note that space can be wasted by preallocation of files. If the file size is less than the allocation amount, the kernel allocates space to the file from the current file size up to the allocation amount. When the file is closed, space below the allocation amount is not freed.

You can preallocate space for a file by using the setfa(1) command with either the -L or the -l (lowercase letter L) option. Both options accept a file length as their argument. Use the -L option for an existing file, which can be empty or contain data. Use the -l option for a file that has no data yet. If you use the -l option, the file cannot grow beyond its preallocated limit.

For example, to preallocate a 1-gigabyte file named /qfs/file_alloc, type the following:


# setfa -l 1g /qfs/file_alloc

After space for a file has been preallocated, truncating a file to 0 length or removing the file returns all space allocated for a file. There is no way to return only part of a file's preallocated space to the file system. In addition, if a file is preallocated in this manner, there is no way to extend the file beyond its preallocated size in future operations.

Selecting a File Allocation Method and Stripe Width

By default, a file uses the allocation method and stripe width specified at mount time (see the mount_samfs(1M) man page). However, an end user might want to use a different allocation scheme for a file or directory. The user could do this by using the setfa(1) command with the -s (stripe) option.

The allocation method can be either round-robin or striped. The -s option specifies the allocation method and the stripe width, as shown in TABLE 7-2.


TABLE 7-2 File Allocations and Stripe Widths

-s Option

Allocation Method

Stripe Width

Explanation

0

Round-robin

Not applicable

The file is allocated on one device until that device has no space.

1-255

Striped

1-255 DAUs

The file is striped across all disk devices with this number of DAUs per disk.


The following example shows how to create a file explicitly by specifying a round-robin allocation method:


# setfa -s 0 /qfs/100MB.rrobin

The following example shows how to create a file explicitly by specifying a striped allocation method with a stripe width of 64 DAUs (preallocation is not used):


# setfa -s 64 /qfs/file.stripe

Selecting a Striped Group Device

Striped group devices are supported for Sun StorEdge QFS file systems only.

A user can specify that a file begin allocation on a particular striped group. If the file allocation method is round-robin, the file is allocated on the designated stripe group.

CODE EXAMPLE 7-2 shows setfa(1) commands specifying that file1 and file2 be independently spread across two different striped groups.


CODE EXAMPLE 7-2 setfa (1) Commands to Spread Files Across Striped Groups
# setfa -g0 -s0 file1
# setfa -g1 -s0 file2

This capability is particularly important for applications that must achieve levels of performance that approach raw device speeds. For more information, see the setfa(1) man page.


Configuring WORM-FS File Systems

Write Once Read Many (WORM) technology is used in many applications because of the integrity of the data and the accepted legal admissibility of stored files that use the technology. Beginning with release 4, update 3, of the Sun StorEdge QFS software, a WORM-FS feature became available as an add-on package called SUNWsamfswm. In the 4U4 software release the WORM-FS interface was modified to be compatible with the new Sun StorEdge 5310 NAS appliance. The previous WORM-FS interface using ssum is no longer supported.



Note - The WORM-FS feature is licensed separately from the Sun StorEdge QFS file system. Contact your local Sun sales representative for information about obtaining the WORM-FS package.



The WORM-FS feature offers default and customizable file-retention periods, data and path immutability, and subdirectory inheritance of the WORM setting.

Enabling the WORM-FS Feature

Use the worm_capable mount option to enable the WORM-FS feature. This option can be provided on the command line when the file system is mounted, or listed in /etc/vfstab or in /opt/SUNWsamfs/famfs.cmd. The rules of precedence for mount options applies.

The worm_capable attribute is stored in the mount table and enables WORM files to be created in directories anywhere in the file system.



Note - You must have system administration privileges to set the worm_capable mount option in /etc/vfstab.



CODE EXAMPLE 7-3 shows the two WORM-FS mount options. The file system samfs1 mounted at /samfs1 is WORM-capable and has the default retention period for files set to 60 minutes.


CODE EXAMPLE 7-3 Using WORM-FS Mount Options
# cat /etc/vfstab#device 	device 	mount 	FS 	fsck	mount	mount#to mount 	to fsck 	point	type	pass	at boot	options#fd 	-	/dev/fd	fd	-	no	-/proc	-	/proc	proc	-	no	-/dev/dsk/c0t0d0s1	- 	-	swap	-	no	-samfs1 	-	/samfs1	samfs	-	yes 	worm_capable,def_retention=60swap	-	/tmp	tmpfs	-	yes	-

After the WORM-FS feature has been enabled and at least one WORM file is resident in the file system, the file system's superblock is updated to reflect the WORM capability. Any subsequent attempt to rebuild the file system through sammkfs will fail.

The worm_capable mount option enables a file system to contain WORM files, but it does not automatically create WORM files. To create a WORM file, you must first make the directory WORM-capable. To do this, create an ordinary directory and then use the WORM trigger command chmod 4000 directory-name to set the WORM bit on the directory. The directory can now contain WORM files.

After setting the WORM bit on a parent directory, you can create files in that directory and then use the WORM trigger chmod 4000 file-name to set the WORM bit on files that you want to be retained.



Note - Use care when applying the WORM trigger. The file data and path cannot be changed after the file has the WORM feature applied. Once this feature is applied to a file, it is irrevocable.



The WORM-FS feature also includes file-retention periods that can be customized. Assigning a retention period to a file maintains the WORM features in that file for the specified period of time. Do one of the following to set a retention period for a file:

CODE EXAMPLE 7-4 shows the creation of a file in a WORM-capable directory, setting of the WORM trigger on the file, and use of the sls command to display the file's WORM features. This example uses the default retention period of the file system (60 minutes, as set in CODE EXAMPLE 7-3).


CODE EXAMPLE 7-4 Creation of a WORM-Capable Directory and WORM File
# cd WORM# echo "This is a test file" >> test# sls -Dtest:	mode: -rw-r--r--  links: 1  owner: root group: other	length: 20  admin id: 0  inode: 1027.1	access: Oct 30 02:50  modification: Oct 30 02:50	changed: Oct 30 02:50  attributes: Oct 30 02:50	creation: Oct 30 02:50  residence: Oct 30 02:50	checksum: gen  no_use  not_val  algo: 0
# chmod 4000 test# sls -Dtest:	mode: -r--r--r--  links: 1  owner: root group: other	length: 20  admin id: 0  inode: 1027.1	access: Oct 30 02:50  modification: Oct 30 02:50	changed: Oct 30 02:50  retention-end: Oct 30 2005 03:50	creation: Oct 30 02:50  residence: Oct 30 02:50	retention: active retention-period: 0y, 0d, 1h, 0m	checksum: gen  no_use  not_val  algo: 0
 

With the addition of the WORM-FS feature, three states are possible for a file in a Sun StorEdge QFS file system:

The normal state represents the state of an ordinary file in a Sun StorEdge QFS file system. A transition to the retained, or active, state occurs when the WORM bit is set on a file. The expired, or over, state occurs when the file's retention period is exceeded.

When a retention period is assigned to a file and the WORM trigger is applied to it, the file's path and data are immutable. When the retention period expires, the state is changed to "expired" but the path and data remain immutable.

When a file is in an expired state, only two operations are available:

If the retention period is extended, the file's state returns to "active" and the new end date and duration are set accordingly.

Both hard and soft links to files can be used with the WORM-FS feature. Hard links can be established only with files that reside in a WORM-capable directory. After a hard link is created, it has the same WORM characteristics as the original file. Soft links can also be established, but a soft link cannot use the WORM features. Soft links to WORM files can be created in any directory in a Sun StorEdge QFS file system.

Another attribute of the WORM-FS feature is directory inheritance. New directories that are created under a directory that includes the worm_capable attribute inherit this attribute from their parent. If a directory has a default retention period set, this retention period is also inherited by any new subdirectories. The WORM bit can be set on any file whose parent directory is WORM-capable. Ordinary users can set the WORM feature on directories and files that they own or have access to by using normal UNIX permissions.



Note - A WORM-capable directory can only be deleted if it contains no WORM files.



Setting the Default Retention Period

The default retention period for a file system can be set as a mount option in the /etc/vfstab file. For example:

samfs1 - /samfs1 samfs - no
bg,worm_capable,def_retention=1y60d

The format for setting the default retention period is MyNdOhPm, in which M, N, O, and P are non-negative integers and y, d, h, and m stand for years, days, hours, and minutes, respectively. Any combination of these units can be used. For example, 1y5d4h3m indicates 1 year, 5 days, 4 hours, and 3 minutes; 30d8h indicates 30 days and 8 hours; and 300m indicates 300 minutes. The new format is backward compatible with previous software versions, in which the retention period was specified in minutes.

You can also set a default retention period for a directory, as described in the following section, Setting the Retention Period Using touch. This retention period overrides the default retention period for the file system. It is also inherited by any subdirectories.

Setting the Retention Period Using touch

You use the touch utility to set or extend a file's or directory's retention period. You can also use touch to shorten the default retention period for a directory (but not for a file).

To set the retention period, you must first advance the file's or directory's access time using touch, and then apply the WORM trigger using the chmod command.

CODE EXAMPLE 7-5 shows the use of the touch utility to set a file's retention period followed by the application of the WORM trigger.


CODE EXAMPLE 7-5 Using touch and chmod to Set the Retention Period
# touch -a -t200508181125 test
# sls -D
test:
  mode: -rw-r--r--  links:   1  owner: root      group: root    
  length:         0  admin id:      0  inode:     1027.1
  access:      Aug 18  2005  modification: Aug 18 11:19
  changed:     Aug 18 11:19  attributes:   Aug 18 11:19
  creation:    Aug 18 11:19  residence:    Aug 18 11:19
 
# chmod 4000 test
# sls -D
test:
  mode: -r-Sr--r--  links:   1  owner: root      group: root    
  length:         0  admin id:      0  inode:     1027.1
  access:      Aug 18  2005  modification: Aug 18 11:19
  changed:     Aug 18 11:19  retention-end: Aug 18 2005 11:25
  creation:    Aug 18 11:19  residence:    Aug 18 11:19
  retention:   active        retention-period: 0y, 0d, 0h, 6m
 

The -a option for touch is used to change the access time of the file or directory. The -t option specifies what time is to be used for the access time field. The format for the time argument is [[CC]YY]MMDDhhmm[.SS], as follows:

The CC, YY, and SS fields are optional. If CC and YY are not given, the default is the current year. See the touch manpage for more information on these options.

To set the retention period to permanent retention, set the access time to its largest possible value: 203801182214.07.

Extending a File's Retention Period

CODE EXAMPLE 7-6 shows an example of using touch to extend a file's retention period.


CODE EXAMPLE 7-6 Using touch to Extend a File's Retention Period
# sls -D test
test:
  mode: -r-Sr--r--  links:   1  owner: root      group: root    
  length:         0  admin id:      0  inode:     1029.1
  access:      Aug 18 11:35  modification: Aug 18 11:33
  changed:     Aug 18 11:33  retention-end: Aug 18 2005 11:35
  creation:    Aug 18 11:33  residence:    Aug 18 11:33
  retention:   over          retention-period: 0y, 0d, 0h, 2m
# touch -a -t200508181159 test
# sls -D
test:
  mode: -r-Sr--r--  links:   1  owner: root      group: root    
  length:         0  admin id:      0  inode:     1029.1
  access:      Aug 18 11:35  modification: Aug 18 11:33
  changed:     Aug 18 11:33  retention-end: Aug 18 2005 11:59
  creation:    Aug 18 11:33  residence:    Aug 18 11:33
  retention:   active        retention-period: 0y, 0d, 0h, 26m

In this example the retention period was extended to Aug 18, 2005 at 11:59AM, which is 26 minutes from the time the WORM trigger was initially applied.

Using sls to View WORM-FS Files

Use the sls command to view WORM file attributes. The -D option shows whether a directory is WORM-capable. Use this option on a file to display when the retention period began, when it will end, the current retention state, and the duration as specified on the command line.

The start of the retention period is stored in the file's changed attribute field. The end of the retention period is stored in the file's attribute time field. This time is displayed as a calendar date. An additional line in the sls output shows the retention period state and duration.

CODE EXAMPLE 7-7 shows an example of how sls -D displays a file's retention status.


CODE EXAMPLE 7-7 Using sls to Find a File's Retention Status
sls -D test
test:	mode: -r-Sr--r--  links:   1  owner: root group: root	length: 5  admin id: 0  inode: 1027.1	access: Aug 18 2005 modification: Aug 18 11:19	changed: Aug 18 11:19 retention-end: Aug 18 2005 11:25	creation: Aug 18 11:19 residence: Aug 18 11:19	retention: active retention-period: 0y, 0d, 0h, 6m

In this example, the retention state is active, as shown by the retention: active designation, meaning that the file has the WORM bit set. The retention period started on August 18, 2005, at 11:19 and will end on August 18, 2005, at 11:25. The retention period was specified to be 0 years, 0 days, 0 hours, and 6 minutes.

Using sfind to Find WORM-FS Files

Use the sfind utility to search for files that have certain retention periods. See the sfind(1) man page for more information on the options. The following options are available:

For example, CODE EXAMPLE 7-8 shows the command to find files whose retention period expires after 12/24/2004 at 15:00.


CODE EXAMPLE 7-8 Using sfind to Find All WORM Files That Expire After a Certain Date
# sfind -rafter 200412241500

For example, shows the command to find files for which more than 1 year, 10 days, 5 hours, and 10 minutes remain before expiration.


CODE EXAMPLE 7-9 Using sfind to Find All WORM Files With More Than a Specified Time Remaining
# sfind -rremain 1y10d5h10m

For example, shows the command to find files that have retention periods longer than 10 days.


CODE EXAMPLE 7-10 Using sfind to Find All WORM Files With Longer Than a Specified Retention Period
# sfind -rlonger 10d


Accommodating Large Files

When manipulating very large files, pay careful attention to the size of disk cache that is available on the system. If you try to write a file that is larger than your disk cache, behavior differs depending on the type of file system that you are using:

If you are operating within a SAM-QFS environment and your application must write a file that is larger than the disk cache, you can segment the file with the segment(1) command. For more information about the segment(1) command, see the segment(1) man page or see the Sun StorEdge SAM-FS Storage and Archive Management Guide.


Configuring a Multireader File System

The multireader file system consists of a single writer host and multiple reader hosts. The writer and reader mount options that enable the multireader file system are compatible with Sun StorEdge QFS file systems only. The mount options are described in this section and on the mount_samfs(1M) man page.

You can mount the multireader file system on the single writer host by specifying the -o writer option with the mount(1M) command. The host system with the writer mount option is the only host system that is allowed to write to the file system. The writer host system updates the file system. You must ensure that only one host in a multireader file system has the file system mounted with the writer mount option enabled. If -o writer is specified, directories are written through to disk at each change and files are written through to disk at close.



caution icon

Caution - The multireader file system can become corrupted if more than one writer host has the file system mounted at one time. It is the site administrator's responsibility to ensure that this situation does not occur.



You can mount a multireader file system on one or more reader hosts by specifying the -o reader option with the mount(1M) command. There is no limit to the number of host systems that can have the multireader file system mounted as a reader.

A major difference between the multireader file system and Sun StorEdge QFS shared file system is that the multireader host reads metadata from the disk, and the client hosts of a Sun StorEdge QFS shared file system read metadata over the network. The Sun StorEdge QFS shared file system supports multireader hosts. In this configuration, multiple shared hosts can be adding content while multiple reader hosts are distributing content.



Note - You cannot specify the writer option on any host if you are mounting the file system as a Sun StorEdge QFS shared file system. You can, however, specify the reader option.

If you want a Sun StorEdge QFS shared file system client host to be a read-only host, mount the Sun StorEdge QFS shared file system on that host with the reader mount option. In addition, set the sync_meta mount option to 1 if you use the reader option in a Sun StorEdge QFS shared file system. For more information about the Sun StorEdge QFS shared file system, see Configuring a Sun StorEdge QFS Shared File System. For more information about mount options, see the mount_samfs(1M) man page.



You must ensure that all readers in a multireader file system have access to the device definitions that describe the ma device. Copy the lines from the mcf(4) file that resides on the primary metadata server host to the mcf(4) files on the alternate metadata servers. After copying the lines, you might need to update the information about the disk controllers because, depending on your configuration, disk partitions might not show up the same way across all hosts.

In a multireader file system environment, the Sun StorEdge QFS software ensures that all servers that access the same file system can always access the current environment. When the writer closes a file, the Sun StorEdge QFS file system immediately writes all information for that file to disk. A reader host can access a file after the file is closed by the writer. You can specify the refresh_at_eof mount option to help ensure that no host system in a multireader file system gets out of sync with the file system.

By default, the metadata information for a file on a reader host is invalidated and refreshed every time a file is accessed. If the data changed, it is invalidated. This includes any type of access, whether through cat(1), ls(1), touch(1), open(2), or other methods. This immediate refresh rate ensures that the data is correct at the time the refresh is done, but it can affect performance. Depending on your site preferences, you can use the mount(1M) command's -o invalid=n option to specify a refresh rate between 0 seconds and 60 seconds. If the refresh rate is set to a small value, the Sun StorEdge QFS file system reads the directory and other metadata information n seconds after the last refresh. More frequent refreshes result in more overhead for the system, but stale information can exist if n is nonzero.



caution icon

Caution - If a file is open for a read on a readerhost, there is no protection against that file's being removed or truncated by the writer. You must use another mechanism, such as application locking, to protect the reader from inadvertent writer actions.




Using the SAN-QFS File System in a Heterogeneous Computing Environment

The SAN-QFS file system enables multiple hosts to access the data stored in a Sun StorEdge QFS system at full disk speeds. This capability can be especially useful for database, data streaming, web page services, or any application that demands high-performance, shared-disk access in a heterogeneous environment.

You can use the SAN-QFS file system in conjunction with fibre-attached devices in a storage area network (SAN). The SAN-QFS file system enables high-speed access to data through Sun StorEdge QFS software and software such as Tivoli SANergy file-sharing software. To use the SAN-QFS file system, you must have both the SANergy (2.2.4 or later) and the Sun StorEdge QFS software. For information about the levels of Sun StorEdge QFS and SANergy software that are supported, contact your Sun sales representative.



Note - In environments that include the Solaris Operating Systems (OS) and supported Linux OSs, use the Sun StorEdge QFS shared file system, not the SAN-QFS file system, on the Solaris hosts.

For information about the Sun StorEdge QFS shared file system, see the Configuring a Sun StorEdge QFS Shared File System. For a comparison of the Sun StorEdge QFS shared file system and the SAN-QFS file system, see SAN-QFS Shared File System and Sun StorEdge QFS Shared File System Comparison.



FIGURE 7-1 depicts a SAN-QFS file system that uses both the Sun StorEdge QFS software and the SANergy software and shows that the clients and the metadata controller (MDC) system manage metadata across the local area network (LAN). The clients perform I/O directly to and from the storage devices.

Note that all clients running only the Solaris OS are hosting the Sun StorEdge QFS software, and that all heterogeneous clients running an OS other than Solaris are hosting the SANergy software and the NFS software. The SAN-QFS file system's metadata server hosts both the Sun StorEdge QFS and the SANergy software. This server acts not only as the metadata server for the file system but also as the SANergy MDC.



Note - The SANergy software is not supported on x64 hardware platforms.




FIGURE 7-1 SAN-QFS File System Using Sun StorEdge QFS Software and SANergy Software

The rest of this section describes other aspects of the SAN-QFS file system:

Before You Begin

Before you enable the SAN-QFS file system, keep the following configuration considerations in mind and plan accordingly:



Note - This documentation assumes that your non-Solaris clients are hosting SANergy software and NFS software for file system sharing. The text and examples in this document reflect this configuration. If your non-Solaris clients host the Samba software instead of the NFS software, see your Samba documentation.



Enabling the SAN-QFS File System

The following procedures describe how to enable the SAN-QFS file system. Perform these procedures in the order in which they are presented:


procedure icon  To Enable the SAN-QFS File System on the Metadata Controller

When you use the SAN-QFS file system, one host system in your environment acts as the SANergy metadata controller (MDC). This is the host system upon which the Sun StorEdge QFS file system resides.

1. Log in to the host upon which the Sun StorEdge QFS file system resides and become superuser.

2. Verify that the Sun StorEdge QFS file system is tested and fully operational.

3. Install and configure the SANergy software.

For instructions, see your SANergy documentation.

4. Use the pkginfo(1) command to verify the SANergy software release level:


# pkginfo -l SANergy

5. Ensure that the file system is mounted.

Use the mount(1M) command either to verify the mount or to mount the file system.

6. Use the share(1M) command in the following format to enable NFS access to client hosts:


MDC# share -F nfs -d qfs-file-system-name /mount-point

For qfs-file-system-name, specify the name of your Sun StorEdge QFS file system, such as, qfs1. For more information about the share(1M) command, see the share(1M) or share_nfs(1M) man page.

For mount-point, specify the mount point of qfs-file-system-name.

7. If you are connecting to Microsoft Windows clients, configure Samba, rather than NFS, to provide security and namespace features.

To do this, add the SANERGY_SMBPATH environment variable in the /etc/init.d/sanergy file and point it to the location of the Samba configuration file. For example, if your Samba configuration file is named /etc/swf/smb.conf, you must add the following lines to the beginning of your /etc/init.d/sanergy file:

SANERGY_SMBPATH=/etc/sfw/smb.conf
export SANERGY_SMBPATH

8. (Optional) Edit the file system table (/etc/dfs/dfstab) on the MDC to enable access at boot time.

Perform this step if you want to automatically enable this access at boot time.


procedure icon  To Enable the SAN-QFS File System on the Clients

After you have enabled the file system on the MDC, you are ready to enable it on the client hosts. The SAN-QFS file system supports several client hosts including IRIX, Microsoft Windows, AIX, and Linux hosts. For information about the specific clients supported, see your Sun sales representative.

Every client has different operational characteristics. This procedure uses general terms to describe the actions you must take to enable the SAN-QFS file system on the clients. For information specific to your clients, see the documentation provided with your client hosts.

1. Log in to each of the client hosts.

2. Edit the file system defaults table on each client and add the file system.

For example, on a Solaris OS, edit the /etc/vfstab file on each client and add the name of your Sun StorEdge QFS file system, as follows:


server:/qfs1  -  /qfs1  nfs  -  yes  noac,hard,intr,timeo=1000

On other operating system platforms, the file system defaults table might reside in a file other than /etc/vfstab. For example, on Linux systems, this file is /etc/fstab.

For more information about editing the /etc/vfstab file, see Sun StorEdge QFS Installation and Upgrade Guide. For information about required or suggested NFS client mount options, see your SANergy documentation.


procedure icon  To Install the SANergy Software on the Clients

After enabling the file system on the client hosts, you are ready to install the SANergy software on the clients. The following procedure describes the SANergy installation process in general terms.

1. Install and configure the SANergy software.

For instructions, see your SANergy documentation.

2. Use the mount command to NFS mount the file system.

For example:


# mount host:/mount-point/ local-mount-point

For host, specify the MDC.

For mount-point, specify the mount point of the Sun StorEdge QFS file system on the MDC.

For local-mount-point, specify the mount point on the SANergy client.

3. Use the SANergy fuse command to fuse the software:


# fuse|mount-point

For mount-point, specify the mount point on the SANergy client.

Unmounting the SAN-QFS File System

The following procedures describe how to unmount a SAN-QFS file system that is using the SANergy software. Perform these procedures in the order in which they are presented:


procedure icon  To Unmount the SAN-QFS File System on the SANergy Clients

Follow these steps for each client host on which you want to unmount the SAN-QFS file system.

1. Log in to the client host and become superuser.

2. Use the SANergy unfuse command to unfuse the file system from the software:


# unfuse|mount-point

For mount-point, specify the mount point on the SANergy client.

3. Use the umount(1M) command to unmount the file system from NFS:


# umount host:/mount-point/ local-mount-point

For host, specify the MDC.

For mount-point, specify the mount point of the Sun StorEdge QFS file system on the MDC.

For local-mount-point, specify the mount point on the SANergy client.


procedure icon  To Unmount the SAN-QFS File System on the Metadata Controller

1. Log in to the MDC system and become superuser.

2. Use the unshare(1M) command to disable NFS access to client hosts:


MDC# unshare qfs-file-system-name /mount-point

For qfs-file-system-name, specify the name of your Sun StorEdge QFS file system, such as qfs1. For more information about the unshare(1M) command, see the unshare(1M) man page.

For mount-point, specify the mount point of qfs-file-system-name.


procedure icon  To Unmount the SAN-QFS File System on the Sun StorEdge QFS Clients

Follow these steps on each participating client host.

1. Log in to a Sun StorEdge QFS client host and become superuser.

2. Use the umount(1M) command to unmount the file system.

For example:


# umount /qfs1


procedure icon  To Unmount the SAN-QFS File System on the Sun StorEdge QFS Server

1. Log in to the host system upon which the Sun StorEdge QFS file system resides and become superuser.

2. Use the umount(1M) command to unmount the file system.

Troubleshooting: Unmounting a SAN-QFS File System With SANergy File Holds

SANergy software issues holds on Sun StorEdge QFS files to reserve them temporarily for accelerated access. If SANergy crashes when holds are in effect, you will not be able to unmount the file system. If you are unable to unmount a SAN-QFS file system, examine the /var/adm/messages file and look for console messages that describe outstanding SANergy holds.

Whenever possible, allow the SANergy file-sharing function to clean up its holds, but in an emergency, or in case of a SANergy file-sharing system failure, use the following procedure to avoid a reboot.


procedure icon  To Unmount a File System in the Presence of SANergy File Holds

1. Use the unshare(1M) command to disable NFS access.

2. Use the samunhold(1M) command to release the SANergy file system holds.

For more information about this command, see the samunhold(1M) man page.

3. Use the umount(1M) command to unmount the file system.

Block Quotas in a SAN-QFS File System

The SANergy software does not enforce block quotas. Therefore, it is possible for you to exceed a block quota when writing a file with the SANergy software. For more information on quotas, see Enabling Quotas.

File Data and File Attributes in a SAN-QFS File System

The SANergy software uses the NFS software for metadata operations, which means that the NFS close-to-open consistency model is used for file data and attributes. File data and attributes among SANergy clients do not support the POSIX coherency model for open files.

Using samgrowfs(1M) to Expand SAN-QFS File Systems

You can use the samgrowfs(1M) command to increase the size of a SAN-QFS file system. To perform this task, follow the procedures described in Adding Disk Cache to a File System.



caution icon

Caution - When using this procedure, be aware that the line-by-line device order in the mcf(4) file must match the order of the devices listed in the file system's superblock.



When the samgrowfs(1M) command is issued, the devices that were already in the mcf(4) file keep their positions in the superblock. New devices are written to subsequent entries in the order in which the are encountered.

If this new order does not match the order in the superblock, the SAN-QFS file system cannot be fused.

SAN-QFS Shared File System and Sun StorEdge QFS Shared File System Comparison

The SAN-QFS shared file system and the Sun StorEdge QFS shared file system have the following similarities:

TABLE 7-3 describes differences between the file systems.


TABLE 7-3 SAN-QFS Shared File System Versus Sun StorEdge QFS Shared File System

SAN-QFS File System

Sun StorEdge QFS Shared File System

Uses NFS protocol for metadata.

Uses natural metadata.

Preferred in heterogeneous computing environments (that is, when not all hosts are Sun systems).

Preferred in homogeneous Solaris OS environments.

Useful in environments where multiple, heterogeneous hosts must be able to write data.

Preferred when multiple hosts must write to the same file at the same time.



Understanding I/O Types

The Sun StorEdge QFS file systems support paged I/O, direct I/O, and switching between the I/O types. The following sections describe these I/O types.

Paged I/O

When paged I/O is used, user data is cached in virtual memory pages, and the kernel writes the data to disk. The standard Solaris OS interfaces manage paged I/O. Paged I/O (also called buffered or cached I/O) is selected by default.

Direct I/O

Direct I/O is a process by which data is transferred directly between the user's buffer and the disk. This means that much less time is spent in the system. For performance purposes, specify direct I/O only for large, block-aligned, sequential I/O.

The setfa(1) command and the sam_setfa(3) library routine both have a -D option that sets the direct I/O attribute for a file or directory. If applied to a directory, files and directories created in that directory inherit the direct I/O attribute. After the -D option is set, the file uses direct I/O.

You can also select direct I/O for a file by using the Solaris OS directio(3C) function call. If you use the function call to enable direct I/O, the setting lasts only while the file is active.

To enable direct I/O on a file-system basis, do one of the following:

For more information, see the setfa(1), sam_setfa(3), directio(3C), samfs.cmd(4), and mount_samfs(1M) man pages.

I/O Switching

By default, paged I/O is performed, and I/O switching is disabled. However, the Sun StorEdge QFS file systems support automatic I/O switching, a process by which a site-defined amount of paged I/O occurs before the system switches automatically to direct I/O.

I/O switching should reduce page cache usage on large I/O operations. To enable I/O switching, use samu(1M), or use the dio_wr_consec and dio_rd_consec parameters as directives in the samfs.cmd file or as options with the mount(1M) command.

For more information about these options, see the mount_samfs(1M) or samfs.cmd(4) man pages.


Increasing File Transfer Performance for Large Files

Sun StorEdge QFS file systems are tuned to work with a mix of file sizes. You can increase the performance of disk file transfers for large files by enabling file system settings.



Note - Sun recommends that you experiment with performance tuning outside of a production environment. Tuning these variables incorrectly can have unexpected effects on the overall system.

If your site has a Sun Enterprise Services (SES) support contract, please inform SES if you change performance tuning parameters.




procedure icon  To Increase File Transfer Performance

1. Set the maximum device read/write directive.

The maxphys parameter in the Solaris /etc/system file controls the maximum number of bytes that a device driver reads or writes at any one time. The default value for the maxphys parameter can differ, depending on the level of your Sun Solaris OS, but it is typically around 128 kilobytes.

Add the following line to /etc/system to set maxphys to 8 megabytes:


set maxphys = 0x800000

2. Set the SCSI disk maximum transfer parameter.

The sd driver enables large transfers for a specific file by looking for the sd_max_xfer_size definition in the /kernel/drv/sd.conf file. If this definition does not exist, the driver uses the value defined in the sd device driver definition, sd_max_xfer_size, which is 1024 x 1024 bytes.

To enable and encourage large transfers, add the following line at the end of the /kernel/drv/sd.conf file:


sd_max_xfer_size=0x800000;

3. Set the fibre disk maximum transfer parameter.

The ssd driver enables large transfers for a specific file by looking for the ssd_max_xfer_size definition in the /kernel/drv/ssd.conf file. If this definition does not exist, the driver uses the value defined in the ssd device driver definition, ssd_max_xfer_size, which is 1024 x 1024 bytes.

Add the following line at the end of the /kernel/drv/ssd.conf file:


ssd_max_xfer_size=0x800000;

4. Reboot the system.

5. Set the writebehind parameter.

This step affects paged I/O only.

The writebehind parameter specifies the number of bytes that are written behind by the file system when paged I/O is being performed on a Sun StorEdge QFS file system. Matching the writebehind value to a multiple of the RAID's read-modify-write value can increase performance.

This parameter is specified in units of kilobytes and is truncated to an 8-kilobyte multiple. If set, this parameter is ignored when direct I/O is performed. The default writebehind value is 512 kilobytes. This value favors large-block, sequential I/O.

Set the writebehind size to a multiple of the RAID 5 stripe size for both hardware and software RAID-5. The RAID-5 stripe size is the number of data disks multiplied by the configured stripe width.

For example, assume that you configure a RAID-5 device with three data disks plus one parity disk (3+1) with a stripe width of 16 kilobytes. The writebehind value should be 48 kilobytes, 96 kilobytes, or some other multiple, to avoid the overhead of the read-modify-write RAID-5 parity generation.

For Sun StorEdge QFS file systems, the DAU (sammkfs(1M) -a command) should also be a multiple of the RAID-5 stripe size. This allocation ensures that the blocks are contiguous.

You should test the system performance after resetting the writebehind size. The following example shows testing timings of disk writes:


# timex dd if=/dev/zero of=/sam/myfile bs=256k count=2048

You can set the writebehind parameter from a mount option, from within the samfs.cmd file, from within the /etc/vfstab file, or from a command within the samu(1M) utility. For information about enabling this from a mount option, see the -o writebehind=n option on the mount_samfs(1M) man page. For information about enabling this from the samfs.cmd file, see the samfs.cmd(4) man page. For information about enabling this from within samu(1M), see the samu(1M) man page.

6. Set the readahead parameter.

This step affects paged I/O only.

The readahead parameter specifies the number of bytes that are read ahead by the file system when paged I/O is being performed on a Sun StorEdge QFS file system. This parameter is specified in units of kilobytes and is truncated to an 8-kilobyte multiple. If set, this parameter is ignored when direct I/O is performed.

Increasing the size of the readahead parameter increases the performance of large file transfers, but only to a point. You should test the performance of the system after resetting the readahead size until you see no more improvement in transfer rates. The following is an example method of testing timings on disk reads:


# timex dd if=/sam/myfile of=/dev/null bs=256k

You should test various readahead sizes for your environment. The readahead parameter should be set to a size that increases the I/O performance for paged I/O, but is not so large as to hurt performance. It is also important to consider the amount of memory and number of concurrent streams when you set the readahead value. Setting the readahead value multiplied by the number of streams to a value that is greater than memory can cause page thrashing.

The default readahead value is 1024 kilobytes. This value favors large-block, sequential I/O. For short-block, random I/O applications, set readahead to the typical request size. Database applications do their own read-ahead, so for these applications, set readahead to 0.

The readahead setting can be enabled from a mount option, from within the samfs.cmd file, from within the /etc/vfstab file, or from a command within the samu(1M) utility. For information about enabling this setting from a mount option, see the -o readahead=n option on the mount_samfs(1M) man page. For information about enabling this setting from the samfs.cmd file, see the samfs.cmd(4) man page. For information about enabling this setting from within samu(1M), see the samu(1M) man page.

7. Set the stripe width.

The -o stripe=n option with the mount(1M) command specifies the stripe width for the file system. The stripe width is based on the disk allocation unit (DAU) size. The n argument specifies that n x DAU bytes are written to one device before writing switches to the next device. The DAU size is set when the file system is initialized by the sammkfs(1M) -a command.

If -o stripe=0 is set, files are allocated to file system devices using the round-robin allocation method. With this method, each file is completely allocated on one device until that device is full. Round-robin is the preferred setting for a multistream environment. If -o stripe=n is set to an integer greater than 0, files are allocated to file system devices using the stripe method. To determine the appropriate -o stripe=n setting, try varying the setting and taking performance readings. Striping is the preferred setting for turnkey applications with a required bandwidth.

You can also set the stripe width from the /etc/vfstab file or from the samfs.cmd file.

For more information about the mount(1M) command, see the mount_samfs(1M) man page. For more information about the samfs.cmd file, see the samfs.cmd(4) man page.


Enabling Qwrite Capability

By default, the Sun StorEdge QFS file systems disable simultaneous reads and writes to the same file. This is the mode defined by the UNIX vnode interface standard, which gives exclusive access to only one write while other writers and readers must wait. Qwrite enables simultaneous reads and writes to the same file from different threads.

The Qwrite feature can be used in database applications to enable multiple simultaneous transactions to the same file. Database applications typically manage large files and issue simultaneous reads and writes to the same file. Unfortunately, each system call to a file acquires and releases a read/write lock inside the kernel. This lock prevents overlapped (or simultaneous) operations to the same file. If the application itself implements file locking mechanisms, the kernel-locking mechanism impedes performance by unnecessarily serializing I/O.

Qwrite can be enabled in the /etc/vfstab file, in the samfs.cmd file, and as a mount option. The -o qwrite option with the mount(1M) command bypasses the file system locking mechanisms (except for applications accessing the file system through NFS) and lets the application control data access. If qwrite is specified, the file system enables simultaneous reads and writes to the same file from different threads. This option improves I/O performance by queuing multiple requests at the drive level.

The following example uses the mount(1M) command to enable Qwrite on a database file system:


# mount -F samfs -o qwrite /db

For more information about this feature, see the qwrite directive on the samfs.cmd(4) man page or the -o qwrite option on the mount_samfs(1M) man page.


Setting the Write Throttle

The -o wr_throttle=n option limits the number of outstanding write kilobytes for one file to n. By default, Sun StorEdge QFS file systems set the wr_throttle to 16 megabytes.

If a file has n write kilobytes outstanding, the system suspends an application that attempts to write to that file until enough bytes have completed the I/O to allow the application to be resumed.

If your site has thousands of streams, such as thousands of NFS-shared workstations accessing the file system, you can tune the -o wr_throttle=n option in order to avoid flushing excessive amounts of memory to disk at once. Generally, the number of streams multiplied by 1024 x the n argument to the -o wr_throttle=n option should be less than the total size of the host system's memory minus the memory needs of the Solaris OS, as shown in this formula:


number-of-streams x n x 1024 < total-memory - Solaris-OS-memory-needs

For turnkey applications, you might want to use a size larger than the default 16,384 kilobytes, because this keeps more pages in memory.


Setting the Flush-Behind Rate

Two mount parameters control the flush-behind rate for pages written sequentially and for stage pages. The flush_behind and stage_flush_behind mount parameters are read from the samfs.cmd file, the /etc/vfstab file, or the mount(1M) command.

The flush_behind=n mount parameter sets the maximum flush-behind value. Modified pages that are being written sequentially are written to disk asynchronously to help the Sun Solaris Volume Manager (VM) layer keep pages clean. To enable this feature, set n to be an integer from 16 through 8192. By default, n is set to 0, which disables this feature. The n argument is specified in kilobyte units.

The stage_flush_behind=n mount parameter sets the maximum stage flush-behind value. Stage pages that are being staged are written to disk asynchronously to help the Sun Solaris VM layer keep pages clean. To enable this feature, set n to be an integer from 16 through 8192. By default, n is set to 0, which disables this feature. The n argument is specified in kilobyte units.

For more information about these mount parameters, see the mount_samfs(1M) man page or the samfs.cmd(4) man page.


Tuning the Number of Inodes and the Inode Hash Table

The Sun StorEdge QFS file system enables you to set the following two tunable parameters in the /etc/system file:

To enable nondefault settings for these parameters, edit the /etc/system file, and then reboot your system.

The following subsections describe these parameters in more detail.

The ninodes Parameter

The ninodes parameter specifies the maximum number of default inodes. The value for ninodes determines the number of in-core inodes that Sun StorEdge QFS software keeps allocated to itself, even when applications are not using many inodes.

The format for this parameter in the /etc/system file is as follows:


set samfs:ninodes = value

The range for value is from 16 through 2000000. The default value for ninodes is one of the following:

The nhino Parameter

The nhino parameter specifies the size of the in-core inode hash table.

The format for this parameter in the /etc/system file is as follows:


set samfs:nhino = value

The range for value is 1 through 1048756. value must be a nonzero power of 2. The default value for nhino is one of the following:

For this example, if nhino is not set, the system assumes 1024, which is 8000 divided by 8 and then rounded up to the nearest power of 2 (210)

When to Set the ninodes and nhino Parameters

When searching for an inode by number (after obtaining an inode number from a directory or after extracting an inode number from an NFS file handle), a Sun StorEdge QFS file system searches its cache of in-core inodes. To speed this process, the file system maintains a hash table to decrease the number of inodes it must check.

A larger hash table reduces the number of comparisons and searches, at a modest cost in memory usage. If the nhino value is too large, the system is slower when undertaking operations that sweep through the entire inode list (inode syncs and unmounts). For sites that manipulate large numbers of files and sites that do extensive amounts of NFS I/O, it can be advantageous to set these parameter values to larger than the defaults.

If your site has file systems that contain only a small number of files, it might be advantageous to make these numbers smaller than the defaults. This could be the case, for example, if you have a file system into which you write large single-file tar(1) files to back up other file systems.