C H A P T E R  8

Advanced Topics

This chapter discusses advanced topics that are beyond the scope of basic system administration and usage. These topics are as follows:


Daemons, Processes, and Tracing

It is useful to have an understanding of system daemons and processes when you are debugging. This section describes the Sun StorEdge QFS and Sun StorEdge SAM-FS daemons and processes. It also provides information on daemon tracing.

Daemons and Processes

All Sun StorEdge QFS and Sun StorEdge SAM-FS daemons are named in the form sam-daemon_named, which is sam-, followed by the daemon name, and followed by the lowercase letter d. This convention allows the daemons to be identified easily. Processes are named in a similar manner; the difference is that they do not end in the lowercase letter d. TABLE 8-1 shows some of the daemons and processes that can be running on your system (others, such as sam-genericd and sam-catserverd, might also be running depending on system activities).

TABLE 8-1 Daemons and Processes

Process

Description

sam-archiverd

Automatically archives Sun StorEdge SAM-FS files. This process runs as long as the Sun StorEdge SAM-FS file system is mounted.

sam-fsd

Master daemon.

sam-rftd

Transfers data between multiple Sun StorEdge SAM-FS host systems.

sam-robotsd

Starts and monitors automated library media changer control daemons.

sam-scannerd

Monitors all manually mounted removable media devices. The scanner periodically checks each device for inserted archive media cartridges.

sam-sharefsd

Invokes the Sun StorEdge QFS shared file system daemon.

sam-releaser

Attempts to release disk space occupied by previously archived files on Sun StorEdge SAM-FS file systems until a low water mark is reached. The releaser is started automatically when a high water mark is reached on disk cache and stops when it has finished releasing files. This is a process, not a daemon.

sam-stagealld

Controls the associative staging of Sun StorEdge SAM-FS files.

sam-stagerd

Controls the staging of Sun StorEdge SAM-FS files.

sam-rpcd

Controls the remote procedure call (RPC) application programming interface (API) server process.


When running Sun StorEdge QFS or Sun StorEdge SAM-FS software, init starts the sam-fsd daemon as part of /etc/inittab processing. It is started at init levels 0, 2, 3, 4, 5, and 6. It should restart automatically in case of kill or failure.

When running Sun StorEdge SAM-FS software, the sam-fsd daemon creates the following processes:

Trace Files

Several Sun StorEdge QFS and Sun StorEdge SAM-FS processes can write messages to trace files. These messages contain information about the state and progress of the work performed by the daemons. The messages are primarily used by Sun Microsystems staff members to to improve performance and diagnose problems. The message content and format are subject to change from release to release.

Trace files can be used in debugging. Typically, trace files are not written. You can enable trace files for Sun StorEdge SAM-FS software by editing the defaults.conf file. You can enable tracing for all processes, or you can enable tracing for individual processes. For information about the processes that you can trace, see the defaults.conf(4) man page.

By default, the trace files are written to the /var/opt/SUNWsamfs/trace directory. In that directory, the trace files are named for the processes (archiver, catserver, fsd, ftpd, recycler, sharefsd, and stager). You can change the names of the trace files by specifying directives in the defaults.conf configuration file. You can also set a limit on the size of a trace file and rotate your tracing logs. For information about controlling tracing, see the defaults.conf(4) man page.

Trace File Content

Trace file messages contain the time and source of the message. The messages are produced by events in the processes. You can select the events by using directives in the defaults.conf file.

The default events are as follows:

You can also trace the following events:

The default message elements (program name, process id (PID), and time) are always included and cannot be excluded. Optionally, the messages can also contain the following elements:

Trace File Rotation

To prevent the trace files from growing indefinitely, the sam-fsd daemon monitors the size of the trace files and periodically executes the following command:

/opt/SUNWsamfs/sbin/trace_rotate

This script moves the trace files to sequentially numbered copies. You can modify this script to suit your operation. Alternatively, you can provide this function using cron(1) or some other facility.

Determining Which Processes Are Being Traced

To determine which processes are being traced currently, enter the sam-fsd(1M) command at the command line. CODE EXAMPLE 8-1 shows the output from this command.

CODE EXAMPLE 8-1 sam-fsd (1M) Command Output
# sam-fsd
Trace file controls:
sam-amld      /var/opt/SUNWsamfs/trace/sam-amld
              cust err fatal misc proc date
              size    0    age 0
sam-archiverd /var/opt/SUNWsamfs/trace/sam-archiverd
              cust err fatal misc proc date
              size    0    age 0
sam-catserverd /var/opt/SUNWsamfs/trace/sam-catserverd
              cust err fatal misc proc date
              size    0    age 0
sam-fsd       /var/opt/SUNWsamfs/trace/sam-fsd
              cust err fatal misc proc date
              size    0    age 0
sam-rftd      /var/opt/SUNWsamfs/trace/sam-rftd
              cust err fatal misc proc date
              size    0    age 0
sam-recycler  /var/opt/SUNWsamfs/trace/sam-recycler
              cust err fatal misc proc date
              size    0    age 0
sam-sharefsd  /var/opt/SUNWsamfs/trace/sam-sharefsd
              cust err fatal misc proc date
              size    0    age 0
sam-stagerd   /var/opt/SUNWsamfs/trace/sam-stagerd
              cust err fatal misc proc date
              size    0    age 0
sam-serverd   /var/opt/SUNWsamfs/trace/sam-serverd
              cust err fatal misc proc date
              size    0    age 0
sam-clientd   /var/opt/SUNWsamfs/trace/sam-clientd
              cust err fatal misc proc date
              size    0    age 0
sam-mgmt      /var/opt/SUNWsamfs/trace/sam-mgmt
              cust err fatal misc proc date
              size    0    age 0
License: License never expires.

For more information about enabling trace files, see the defaults.conf(4) man page and the sam-fsd(1M) man page.


Using the setfa(1) Command to Set File Attributes

The Sun StorEdge QFS and Sun StorEdge SAM-FS file systems allow end users to set performance attributes for files and directories. Applications can enable these performance features on a per-file or per-directory basis. The following sections describe how the application programmer can use these features to select file attributes for files and directories, to preallocate file space, to specify the allocation method for the file, and to specify the disk stripe width.

For more information about implementing the features described in the following subsections, see the setfa(1) man page.

Selecting File Attributes for Files and Directories

File attributes are set using the setfa(1) command. The setfa(1) command sets attributes on a new or existing file. The file is created if it does not already exist.

You can set attributes on a directory as well as a file. When using setfa(1) with a directory, files and directories created within that directory inherit the attributes set in the original directory. To reset attributes on a file or directory to the default, use the -d (default) option. When the -d option is used, attributes are first reset to the default and then other attributes are processed.

Preallocating File Space

An end user can preallocate space for a file. This space is associated with a file so that no other files in the file system can use the disk addresses allocated to this file. Preallocation ensures that space is available for a given file, which avoids a file system full condition. Preallocation is assigned at the time of the request rather than when the data is actually written to disk.

Note that space can be wasted when preallocating files. If the file size is less than the allocation amount, the kernel allocates space to the file from the current file size up to the allocation amount. When the file is closed, space below the allocation amount is not freed.

You can preallocate space for a file by using the setfa(1) command with either the -L or the -l (lowercase letter L) options. Both options accept a file length as their argument. You can use the -L option for an existing file, and that file either can be empty or it can contain data. Use the -l option for a file that has no data yet. If you use the -l option, the file cannot grow beyond its preallocated limit.

For example, to preallocate a 1-gigabyte file named /qfs/file_alloc, type the following:

# setfa -l 1g /qfs/file_alloc

After space for a file has been preallocated, truncating a file to 0 length or removing the file returns all space allocated for a file. There is no way to return only part of a file's preallocated space to the file system. In addition, if a file is preallocated in this manner, there is no way to extend the file beyond its preallocated size in future operations.

Selecting a File Allocation Method and Stripe Width

By default, a file created uses the allocation method and stripe width specified at mount time (see the mount_samfs(1M) man page). However, an end user might want to use a different allocation scheme for a file or directory of files, and this can be accomplished by using the setfa(1) command with the -s (stripe) option.

The allocation method can be either round-robin or striped. The -s option determines the allocation method and the stripe width, and TABLE 8-2 shows the effect of this option.

TABLE 8-2 File Allocations and Stripe Widths

-s stripe

Allocation Method

Stripe Width

Explanation

0

Round-robin

n/a

The file is allocated on one device until that device has no space.

1-255

Striped

1-255 DAUs

The file stripes across all disk devices with this number of DAUs per disk.


The following example shows how to create a file explicitly by specifying a round-robin allocation method:

# setfa -s 0 /qfs/100MB.rrobin

The following example shows how to create a file explicitly by specifying a striped allocation method with a stripe width of 64 DAUs (preallocation is not used):

# setfa -s 64 /qfs/file.stripe

Selecting a Striped Group Device

Striped group devices are supported for Sun StorEdge QFS file systems only.

A user can specify that a file begin allocation on a particular striped group. If the file allocation method is round-robin, the file is allocated on the designated stripe group.

CODE EXAMPLE 8-2 shows setfa(1) commands that specify that file1 and file2 be independently spread across two different striped groups.

CODE EXAMPLE 8-2 setfa (1) Commands to Spread Files Across Striped Groups
# setfa -g0 -s0 file1
# setfa -g1 -s0 file2

This capability is particularly important for applications that must achieve levels of performance that approach raw device speeds. For more information, see the setfa(1) man page.


Accommodating Large Files

When manipulating very large files, pay careful attention to the size of disk cache available on the system. If you try to write a file that is larger than your disk cache, behavior differs depending on the type of file system you are using, as follows:

If you are operating within a Sun StorEdge SAM-FS environment and if your application requires writing a file that is larger than the disk cache, you can segment the file using the segment(1) command. For more information about the segment(1) command, see the segment(1) man page or see the Sun StorEdge SAM-FS Storage and Archive Management Guide.


Multireader File System

The multireader file system consists of a single writer host and multiple reader hosts. The writer and reader mount options that enable the multireader file system are compatible with Sun StorEdge QFS file systems only. The mount options are described in this section and on the mount_samfs(1M) man page.

You can mount the multireader file system on the single writer host by specifying the -o writer option on the mount(1M) command. The host system with the writer mount option is the only host system that is allowed to write to the file system. The writer host system updates the file system. You must ensure that only one host in a multireader file system has the file system mounted with the writer mount option enabled. If -o writer is specified, directories are written through to disk at each change and files are written through to disk at close.



caution icon

Caution - The multireader file system can become corrupted if more than one writer host has the file system mounted at one time. It is the site's responsibility to insure that this situation does not occur.



You can mount a multireader file system on one or more reader hosts by specifying the -o reader option on the mount(1M) command. There is no limit to the number of host systems that can have the multireader file system mounted as a reader.

A major difference between the multireader file system and Sun StorEdge QFS shared file system is that the multireader host read metadata from the disk, and the client hosts of a Sun StorEdge QFS shared file system read metadata over the network. The Sun StorEdge QFS shared file system supports multireader hosts. In this configuration, multiple shared hosts can be adding content while multiple reader hosts can be distributing content.



Note - You cannot specify the writer option on any host if you are mounting the file system as a Sun StorEdge QFS shared file system. You can, however, specify the reader option.

If you want a Sun StorEdge QFS shared file system client host to be a read-only host, mount the Sun StorEdge QFS shared file system on that host with both the shared and reader mount options. In addition, set the sync_meta mount option to 1 if you use the reader option in a Sun StorEdge QFS shared file system. For more information on the Sun StorEdge QFS shared file system, see Sun StorEdge QFS Shared File System. For more information on mount options, see the mount_samfs(1M) man page.



You must ensure that all readers in a multireader file system have access to the device definitions that describe the ma device. Copy the lines from the mcf file that resides on the primary metadata server host to the mcf files on the alternate metadata servers. After copying the lines, you might need to update the information on the disk controllers because depending on your configuration, disk partitions might not show up the same way across all hosts.

In a multireader file system environment, the Sun StorEdge QFS software ensures that all servers that access the same file system can always access the current environment. When the writer closes a file, the Sun StorEdge QFS file system writes all information for that file to disk immediately. A reader host can access a file after the file is closed by the writer. You can specify the refresh_at_eof mount option to help ensure that no host system in a multireader file system risks getting out of sync with the file system.

By default, the metadata information for a file on a reader host is invalidated and refreshed every time a file is accessed. If the data changed, it is invalidated. This includes any type of access, whether through cat(1), ls(1), touch(1), open(2), or other methods. This immediate refresh rate ensures that the data is correct at the time the refresh is done, but it can affect performance. Depending on your site preferences, you can use the mount(1M) command's -o invalid=n option to specify a refresh rate between 0 seconds and 60 seconds. If the refresh rate is set to a small value, the Sun StorEdge QFS file system reads the directory and other metadata information n seconds after the last refresh. More frequent refreshes result in more overhead for the system, but stale information can exist if n is nonzero.



caution icon

Caution - If a file is open for a read on a reader host, there is no protection for that file being removed or truncated by the writer. You must use another mechanism, such as application locking, to protect the reader from inadvertant writer actions.




Using the SAN-QFS File System

The SAN-QFS file system enables multiple users to access the same data at full disk speeds. This product can be especially useful for database, data streaming, web page service, or any application that demands high-performance, shared-disk access in a heterogeneous environment.

You can use the SAN-QFS file system in conjunction with fiber-attached devices in a storage area network (SAN). The SAN-QFS file system enables high-speed access to data using Sun StorEdge QFS software and software such as Tivoli SANergy File Sharing software. To use the SAN-QFS file system, you must have both Sun StorEdge QFS 4.1 release and the Tivoli SANergy File Sharing 3.2 or later software installed. For information about other levels of Sun StorEdge QFS and Tivoli SANergy File Sharing software that are supported, contact your Sun sales representative.



Note - In environments that include only Solaris operating system (OS) platforms, Sun Microsystems recommends that you use the Sun StorEdge QFS shared file system described in Sun StorEdge QFS Shared File System.



The following sections describe other aspects of the SAN-QFS file system:


procedure icon  To Enable the SAN-QFS File System

1. Verify your environment.

Verify that the following conditions are present:

2. Use the mount(1M) command to mount the file system on your server.

3. Enable NFS access.

Use the share(1M) command in the following format to enable NFS access to client hosts:

# share qfs_file_system_name

For qfs_file_system_name, specify the name of your Sun StorEdge QFS file system. For example, qfs1. For more information about the share(1M) command, see the share(1M) or share_nfs(1M) man pages.

4. Edit the file system table (/etc/dfs/dfstab) on the server to enable access at boot time. (Optional)

Perform this step if you want to automatically enable this access at boot time.

5. Edit the /etc/vfstab file on each client and add the file system.

Add the qfs_file_system_name from Step 3 to the table.

For example, you can edit the /etc/vfstab file and add a line similar to the following:

server:/qfs1  -  /qfs1  samfs  -  yes  stripe=1

For more information about editing the /etc/vfstab file, see Sun StorEdge QFS and Sun StorEdge SAM-FS Software Installation and Configuration Guide.

6. Use the mount(1M) command to mount the Sun StorEdge QFS file system on each client.

For example:

client# mount qfs1

Enter one mount(1M) command per client. For more information about the mount(1M) command, see the mount(1M) or the mount_samfs(1M) man pages.

7. Configure the Tivoli SANergy File Sharing software.

Use the config(1M) command (in /opt/SANergy/config) to invoke the SANergy configuration tool. The SANergy configuration tool has a graphical user interface. Provide the information requested at each step in its process. For more information about this tool, see your Tivoli SANergy documentation.

Releasing SANergy File Holds

The samunhold(1M) command can be used to release SANergy file holds. If holds are present in a file system, the holds are described in messages written to console messages and to /var/adm/messages when you attempt to unmount the file system.

Whenever possible, allow SANergy File Sharing to clean up its holds, but in an emergency, or in case of a SANergy File Sharing system failure, you can use the samunhold(1M) command to avoid a reboot.

For more information about this command, see the samunhold(1M) man page.

Expanding SAN-QFS File Systems

You can use the samgrowfs(1M) command to increase the size of a SAN-QFS file system. To perform this task, follow the procedures described in Adding Disk Cache to a File System. When using this procedure, be aware that the line-by-line device order in the mcf file must match the order of the devices listed in the file system's superblock. The devices listed in the file system's superblock are numbered in the order encountered in the mcf file (when created).

When the samgrowfs(1M) command is issued, the devices that had been in the mcf file prior to issuing the samgrowfs(1M) command keep their position in the superblock. New devices are written to subsequent entries in the order encountered.

If this new order does not match the order in the superblock, the SAN-QFS file system cannot be fused.

SAN-QFS Shared File System and Sun StorEdge QFS Shared File System Comparison

The SAN-QFS file system and the Sun StorEdge QFS shared file system are both shared file systems with the following similarities:

TABLE 8-3 shows the file systems differences.

TABLE 8-3 SAN-QFS Shared File System Versus Sun StorEdge QFS Shared File System

SAN-QFS File System

Sun StorEdge QFS Shared File System

Does not use the natural metadata and incurs additional latency in opening files.

Uses natural metadata.

Preferred in heterogeneous computing environments (that is, when not all hosts are Sun systems).

Preferred in homogeneous Solaris OS environments.

Useful in environments where multiple hosts must be able to write data.

Multiple hosts can write. Preferred when multiple hosts must write to the same file at the same time.

User mode implementation.

Kernel mode implementation with strong security.



I/O Performance

The Sun StorEdge QFS and Sun StorEdge SAM-FS file systems support paged I/O, direct I/O, and switching between the I/O types. The following sections describe these I/O types.

Paged I/O

Paged I/O (also called buffered or cached I/O) is selected by default.

Direct I/O

Direct I/O is a process by which data is transferred directly between the user's buffer and the disk. This means that much less time is spent in the system. For performance purposes, specify direct I/O only for large, block-aligned, sequential I/O.

The setfa(1) command and the sam_setfa(3) library routine both have a -D option that sets the direct I/O attribute for a file and/or directory. If applied to a directory, files and directories created in that directory inherit the direct I/O attribute. After the -D option is set, the file uses direct I/O.

You can also select direct I/O for a file by using the Solaris OS directio(3C) function call. If you use the function call to enable direct I/O, it is a temporary setting. The setting lasts only while the file is active.

To enable direct I/O on a file-system basis, do one of the following:

For more information, see the setfa(1), sam_setfa(3), directio(3C), samfs.cmd(4) , and mount_samfs(1M) man pages.

I/O Switching

The Sun StorEdge QFS and Sun StorEdge SAM-FS file systems support automatic I/O switching. I/O switching is a process by which you can specify that a certain amount of paged I/O should occur before the system switches to direct I/O. This automatic, direct I/O switching allows the system to perform a site-defined amount of consecutive I/O operations and then automatically switch from paged I/O to direct I/O. By default, paged I/O is performed, and I/O switching is disabled.

I/O switching should reduce page cache usage on large I/O operations. To enable this, use the dio_wr_consec and dio_rd_consec parameters as directives in the samfs.cmd file or as options to the mount(1M) command. You can also enable this by using samu(1M).

For more information about these options, see the mount_samfs(1M) or samfs.cmd(4) man pages.


Increasing Large File Transfer Performance

Sun StorEdge QFS and Sun StorEdge SAM-FS file systems are tuned to work with a mix of file sizes. You can increase the performance of disk file transfers for large files by enabling file system settings.



Note - Sun recommends that you experiment with performance tuning outside of a production environment. Tuning these variables incorrectly can have unexpected effects on the overall system.

If your site has a Sun Enterprise Services (SES) support contract, please inform SES if you change performance tuning parameters.




procedure icon  To Increase File Transfer Performance

1. Set the maximum device read/write directive.

The maxphys parameter in the Solaris /etc/system file controls the maximum number of bytes that a device driver reads or writes at any one time. The default value for the maxphys parameter can differ depending on the level of your Sun Solaris OS, but it is typically around 128 kilobytes.

Add the following line to /etc/system to set maxphys to 8 megabytes:

set maxphys = 0x800000

2. Set the SCSI disk maximum transfer parameter.

The sd driver enables large transfers for a specific file by looking for the sd_max_xfer_size definition in the /kernel/drv/sd.conf file. If it is not defined, it uses the value defined in the sd device driver definition, sd_max_xfer_size, which is 1024*1024 bytes.

To enable and encourage large transfers, add the following line at the end of the /kernel/drv/sd.conf file:

sd_max_xfer_size=0x800000;

3. Set the fibre disk maximum transfer parameter.

The ssd driver enables large transfers for a specific file by looking for the ssd_max_xfer_size definition in the /kernel/drv/ssd.conf file. If it is not defined, it uses the value defined in the ssd device driver definition, ssd_max_xfer_size, which is 1024*1024 bytes.

Add the following line at the end of the /kernel/drv/ssd.conf file:

ssd_max_xfer_size=0x800000;

4. Reboot the system.

5. Set the writebehind parameter.

This step affects paged I/O only.

The writebehind parameter specifies the number of bytes that are written behind by the file system when performing paged I/O on a Sun StorEdge QFS or Sun StorEdge SAM-FS file system. Matching the writebehind value to a multiple of the RAID's read-modify-write value can increase performance.

This parameter is specified in units of kilobytes and is truncated to an 8-kilobyte multiple. If set, this parameter is ignored when direct I/O is performed. The default writebehind value is 512 kilobytes. This value favors large-block, sequential I/O.

Set the writebehind size to a multiple of the RAID 5 stripe size for both hardware and software RAID 5. The RAID 5 stripe size is the number of data disks multiplied by the configured stripe width.

For example, assume that you configure a RAID 5 device with three data disks plus one parity disk (3+1) with a stripe width of 16 kilobytes. The writebehind value should be 48 kilobytes, 96 kilobytes, or some other multiple, to avoid the overhead of the read-modify-write RAID 5 parity generation.

For Sun StorEdge QFS file systems, the DAU (sammkfs(1M) -a command) should also be a multiple of the RAID 5 stripe size. This allocation ensures that the blocks are contiguous.

You should test the system performance after resetting the writebehind size. The following example shows testing timings of disk writes:

# timex dd if=/dev/zero of=/sam/myfile bs=256k count=2048

You can set writebehind parameter from a mount option, from within the samfs.cmd file, from within the /etc/vfstab file, or from a command within the samu(1M) utility. For information about enabling this from a mount option, see the -o writebehind=n option on the mount_samfs(1M) man page. For information about enabling this from the samfs.cmd file, see the samfs.cmd(4) man page. For information about enabling this from within samu(1M), see the samu(1M) man page.

6. Set the readahead parameter.

This step affects paged I/O only.

The readahead parameter specifies the number of bytes that are read ahead by the file system when performing paged I/O on a Sun StorEdge QFS or Sun StorEdge SAM-FS file system. This parameter is specified in units of kilobytes and is truncated to an 8-kilobyte multiple. If set, this parameter is ignored when direct I/O is performed.

Increasing the size of the readahead parameter increases the performance of large file transfers, but only to a point. You should test the performance of the system after resetting the readahead size until you see no more improvement in transfer rates. The following is an example method of testing timings on disk reads:

# timex dd if=/sam/myfile of=/dev/null bs=256k

The readahead parameter should be set to a size that increases the I/O performance for paged I/O. Also note that too large a readahead size can hurt performance. You should test various readahead sizes for your environment. It is important to consider the amount of memory and number of concurrent streams when you set the readahead value. Setting the readahead value multiplied by the number of streams to a value that is greater than memory can cause page thrashing.

The default readahead is 1024 kilobytes. This value favors large-block, sequential I/O. For short-block, random I/O applications, set readahead to the typical request size. Database applications do their own readahead, so for these applications, set readahead to 0.

The readahead setting can be enabled from a mount option, from within the samfs.cmd file, from within the /etc/vfstab file, or from a command within the samu(1M) utility. For information about enabling this from a mount option, see the -o readahead=n option on the mount_samfs(1M) man page. For information about enabling this from the samfs.cmd file, see the samfs.cmd(4) man page. For information about enabling this from within samu(1M), see the samu(1M) man page.

7. Set the stripe width.

The -o stripe=n option on the mount(1M) command specifies the stripe width for the file system. The stripe width is based on the disk allocation unit (DAU) size. The n argument specifies that n * DAU bytes are written to one device before switching to the next device. The DAU size is set when the file system is initialized by the sammkfs(1M) -a command.

If -o stripe=0 is set, files are allocated to file system devices using the round-robin allocation method. Each file is created on the next device. Each file is completely allocated on this device until that device is full. Round robin is the preferred setting for a multistream environment. If -o stripe=n is set to an integer greater than 0, files are allocated to file system devices using the stripe method. To determine the appropriate -o stripe=n setting, try varying the setting and taking performance readings. Striping is the preferred setting for turnkey applications with a required bandwidth.

You can also set the stripe width from the /etc/vfstab file or from the samfs.cmd file.

For more information about the mount(1M) command, see the mount_samfs(1M) man page. For more information about the samfs.cmd file, see the samfs.cmd(4) man page.


Qwrite

The Qwrite capability can be enabled in Sun StorEdge QFS environments.

By default, the Sun StorEdge QFS file systems disable simultaneous reads and writes to the same file. This is the mode defined by the UNIX vnode interface standard, which gives exclusive access to only one write while other writers and readers must wait. Qwrite enables simultaneous reads and writes to the same file from different threads.

The Qwrite feature can be used in database applications to enable multiple simultaneous transactions to the same file. Database applications typically manage large files and issue simultaneous reads and writes to the same file. Unfortunately, each system call to a file acquires and releases a read/write lock inside the kernel. This lock prevents overlapped (or simultaneous) operations to the same file. If the application itself implements file locking mechanisms, the kernel locking mechanism impedes performance by unnecessarily serializing I/O.

Qwrite can be enabled in the /etc/vfstab file, in the samfs.cmd file, and as a mount option. The -o qwrite option on the mount(1M) command bypasses the file system locking mechanisms (except for applications accessing the file system through NFS) and lets the application control data access. If qwrite is specified, the file system enables simultaneous reads and writes to the same file from different threads. This option improves I/O performance by queuing multiple requests at the drive level.

The following example uses the mount(1M) command to enable Qwrite on a database file system:

# mount -F samfs -o qwrite /db

For more information about this feature, see the qwrite directive on the samfs.cmd(4) man page or the -o qwrite option on the mount_samfs(1M) man page.


Setting the Write Throttle

By default, the Sun StorEdge QFS and Sun StorEdge SAM-FS file systems set the -o wr_throttle=n option to the mount(1M) command to 16 megabytes. The -o wr_throttle=n option limits the number of outstanding write kilobytes for one file to n.

If a file has n write kilobytes outstanding, the system suspends an application that attempts to write to that file until enough bytes have completed the I/O to allow the application to be resumed.

If your site has thousands of streams, such as thousands of NFS-shared workstations accessing the file system, you can tune the -o wr_throttle=n option in order to avoid memory stales. Generally, the number of streams multiplied by 1024 x the n argument to the -o wr_throttle=n option should be less than the total size of the host system's memory minus the memory needs of the Solaris OS. In other words:

number_of_streams * n * 1024 < total_memory - Solaris_OS_memory_needs

For turnkey applications, you might want to use a size larger than the default 16,384 kilobytes because this keeps more pages in memory.


Setting the Flush-Behind Rate

Two mount parameters control the flush-behind rate for pages written sequentially and stage pages. The flush_behind and stage_flush_behind mount parameters are read from the samfs.cmd file, the /etc/vfstab file, or from the mount(1M) command.

The flush_behind=n mount parameter sets the maximum flush-behind value. Modified pages that are being written sequentially are written to disk asynchronously to help the Sun Solaris VM layer keep pages clean. To enable this feature, set n to be an integer, 16 less than or equal n less than or equal 8192. By default, n is set to 0, which disables this feature. The n argument is specified in kilobyte units.

The stage_flush_behind=n mount parameter sets the maximum stage flush-behind value. Stage pages that are being staged are written to disk asynchronously to help the Sun Solaris VM layer keep pages clean. To enable this feature, set n to be an integer such that, 16 less than or equal n less than or equal 8192. By default, n is set to 0, which disables this feature. The n argument is specified in kilobyte units.

For more information about these mount parameters, see the mount_samfs(1M) man page or the samfs.cmd(4) man page.


Tuning the Number of Inodes and the Inode Hash Table

The Sun StorEdge QFS and Sun StorEdge SAM-FS file systems allow you to set the following two tunable parameters in the /etc/system file:

To enable nondefault settings for these parameters, edit the /etc/system file, and then reboot your system.

The following sections describe these parameters in more detail.

The ninodes Parameter

The ninodes parameter specifies the maximum number of default inodes. The value for ninodes determines the number of in-core inodes that Sun StorEdge QFS and Sun StorEdge SAM-FS keep allocated to themselves, even when applications are not using many inodes.

The format for this parameter in the /etc/system file is as follows:

set samfs:ninodes = value

The range for value is 16 less than or equal value less than or equal 2000000. The default value for ninodes is one of the following:

For example:

set samfs:ninodes = 4000

The nhino Parameter

The nhino parameter specifies the size of the in-core inode hash table.

The format for this parameter in the /etc/system file is as follows:

set samfs:nhino = value

The range for value is 1 less than or equal value less than or equal 1048756. value must be a nonzero power of two. The default value for nhino is one of the following:

For this example, if nhino is not set, the system assumes 1024, which is 8000 divided by 8 and then rounded up to the nearest power of two.

For example:

set samfs:nhino = 1024

When to Set the ninodes and nhino Parameters

When searching for an inode by number (after obtaining an inode number from a directory or after extracting an inode number from an NFS file handle), the Sun StorEdge QFS and Sun StorEdge SAM-FS file systems search their cache of in-core inodes. To speed this process, they maintain a hash table to decrease the number of inodes they must check.

A larger hash table reduces the number of comparisons and searches, at a modest cost in memory usage. If the nhino value is too large, the system is slower when undertaking operations that sweep through the entire inode list (inode syncs and unmounts). For sites that manipulate large numbers of files and sites that do extensive amounts of NFS I/O, it can be advantageous to set these parameter values to larger than the defaults.

If your site has file systems that contain only a small number of files, it might be advantageous to make these numbers smaller than the defaults. This could be the case, for example, if you have a file system into which you write large single-file tar(1) files to back up other file systems.