Oracle Solaris ZFS Administration Guide

Chapter 1 Oracle Solaris ZFS File System (Introduction)

This chapter provides an overview of the Oracle Solaris ZFS file system and its features and benefits. This chapter also covers some basic terminology used throughout the rest of this book.

The following sections are provided in this chapter:

What's New in ZFS?

This section summarizes new features in the ZFS file system.

Splitting a Mirrored ZFS Storage Pool (zpool split)

Oracle Solaris 10 9/10 Release: In this Solaris release, you can use the zpool split command to split a mirrored storage pool, which detaches a disk or disks in the original mirrored pool to create another identical pool.

For more information, see Creating a New Pool By Splitting a Mirrored ZFS Storage Pool.

New ZFS System Process

Oracle Solaris 10 9/10 Release: In this Solaris release, each ZFS storage pool has an associated process, zpool-poolname. The threads in this process are the pool's I/O processing threads to handle I/O tasks, such as compression and checksumming, that are associated with the pool. The purpose of this process is to provide visibility into each storage pool's CPU utilization. Information about these process can be reviewed by using the ps and prstat commands. These processes are only available in the global zone. For more information, see SDC(7).

Changes to the zpool list Command

Oracle Solaris 10 9/10 Release: In this Solaris release, the zpool list output has changed to provide better space allocation information. For example:


# zpool list tank
NAME    SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
tank    136G  55.2G  80.8G    40%  ONLINE  -

The previous USED and AVAIL fields have been replaced with ALLOC and FREE.

The ALLOC field identifies the amount of physical space allocated to all datasets and internal metadata. The FREE field identifies the amount of unallocated space in the pool.

For more information, see Displaying Information About ZFS Storage Pools.

ZFS Storage Pool Recovery

Oracle Solaris 10 9/10 Release: A storage pool can become damaged if underlying devices become unavailable, a power failure occurs, or if more than the supported number of devices fail in a redundant ZFS configuration. This release provides new command features for recovering your damaged storage pool. However, using this recovery feature means that the last few transactions that occurred prior to the pool outage might be lost.

Both the zpool clear and zpool import commands support the -F option to possibly recover a damaged pool. In addition, running the zpool status, zpool clear, or zpool import command automatically report a damaged pool and these commands describe how to recover the pool.

For more information, see Repairing ZFS Storage Pool-Wide Damage.

ZFS Log Device Enhancements

Oracle Solaris 10 9/10 release: The following log device enhancements are available:

Triple Parity RAIDZ (raidz3)

Oracle Solaris 10 9/10 Release: In this Solaris release, a redundant RAID-Z configuration can now have either single-, double-, or triple-parity, which means that one, two, three device failures can be sustained respectively, without any data loss. You can specify the raidz3 keyword for a triple-parity RAID-Z configuration. For more information, see Creating a RAID-Z Storage Pool.

Holding ZFS Snapshots

Oracle Solaris 10 9/10 Release: If you implement different automatic snapshot policies so that older snapshots are being inadvertently destroyed by zfs receive because they no longer exist on the sending side, you might consider using the snapshots hold feature in this Solaris release.

Holding a snapshot prevents it from being destroyed. In addition, this feature allows a snapshot with clones to be deleted pending the removal of the last clone by using the zfs destroy -d command.

You can hold a snapshot or set of snapshots. For example, the following syntax puts a hold tag, keep, on tank/home/cindys/snap@1.


# zfs hold keep tank/home/cindys@snap1

For more information, see Holding ZFS Snapshots.

ZFS Device Replacement Enhancements

Oracle Solaris 10 9/10 Release: In this Solaris release, a system event or sysevent is provided when an a disk is replaced with larger disk or the disks in the pool are replaced with larger disks. ZFS has been enhanced to recognize these events and adjusts the pool based on the new size of the disk, depending on the setting of the autoexpand property. You can use the autoexpand pool property to enable or disable automatic pool expansion when a larger disk replaces a smaller disk.

These features enable you to increase the pool sizewithout having to export and import pool or reboot the system.

For example, the autoexpand property isenabled on the tank pool.


# zpool set autoexpand=on tank

Or, you can create the pool with the autoexpand property enabled.


# zpool create -o autoexpand=on tank c1t13d0

The autoexpand property is disabled by default so you can decide whether you want the pool size expanded when a larger disk replaces a smaller disk.

The pool size also be expanded by using the zpool online -e command. For example:


# zpool online -e tank c1t6d0

Or, you can reset the autoexpand property after the larger disk is attached or made available by using the zpool replace feature. For example, the following pool is created with one 8-GB disk (c0t0d0). The 8-GB disk is replaced with a 16-GB disk (c1t13d0), but the pool size is not expanded until the autoexpand property is enabled.


# zpool create pool c0t0d0
# zpool list
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
pool   8.44G  76.5K  8.44G     0%  ONLINE  -
# zpool replace pool c0t0d0 c1t13d0
# zpool list
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
pool   8.44G  91.5K  8.44G     0%  ONLINE  -
# zpool set autoexpand=on pool
# zpool list
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
pool   16.8G   91.5K  16.8G    0%  ONLINE  -

Another way to expand the LUN in the above example without enabling the autoexpand property, is to use the zpool online -e command even though the device is already online. For example:


# zpool create tank c0t0d0
# zpool list tank
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
tank   8.44G  76.5K  8.44G     0%  ONLINE  -
# zpool replace tank c0t0d0 c1t13d0
# zpool list tank
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
tank   8.44G  91.5K  8.44G     0%  ONLINE  -
# zpool online -e tank c1t13d0
# zpool list tank
NAME   SIZE   ALLOC  FREE    CAP   HEALTH  ALTROOT
tank   16.8G    90K  16.8G     0%  ONLINE  -

Additional device replacement enhancements in this release include the following features:

For more information about replacing devices, see Replacing Devices in a Storage Pool.

ZFS and Flash Installation Support

Solaris 10 10/09 Release: In this Solaris release, you can set up a JumpStart profile to identify a flash archive of a ZFS root pool. For more information, see Installing a ZFS Root File System (Oracle Solaris Flash Archive Installation).

ZFS User and Group Quotas

Solaris 10 10/09 Release: In previous Solaris releases, you could apply quotas and reservations to ZFS file systems to manage and reserve disk space.

In this Solaris release, you can set a quota on the amount of disk space consumed by files that are owned by a particular user or group. You might consider setting user and group quotas in an environment with a large number of users or groups.

You can set a user quota by using the zfs userquota property. To set a group quota, use the zfs groupquota property. For example:


# zfs set userquota@user1=5G tank/data
# zfs set groupquota@staff=10G tank/staff/admins

You can display a user's or a group's current quota setting as follows:


# zfs get userquota@user1 tank/data
NAME       PROPERTY         VALUE            SOURCE
tank/data  userquota@user1  5G               local
# zfs get groupquota@staff tank/staff/admins
NAME               PROPERTY          VALUE             SOURCE
tank/staff/admins  groupquota@staff  10G               local

Display general quota information as follows:


# zfs userspace tank/data
TYPE        NAME   USED  QUOTA  
POSIX User  root     3K   none  
POSIX User  user1     0    5G  

# zfs groupspace tank/staff/admins
TYPE         NAME   USED  QUOTA  
POSIX Group  root     3K   none  
POSIX Group  staff     0    10G  

You can display an individual user's disk space usage by viewing the userused@user property. A group's disk space usage can be viewed by using the groupused@group property. For example:


# zfs get userused@user1 tank/staff
NAME        PROPERTY        VALUE           SOURCE
tank/staff  userused@user1  213M            local
# zfs get groupused@staff tank/staff
NAME        PROPERTY         VALUE            SOURCE
tank/staff  groupused@staff  213M             local

For more information about setting user quotas, see Setting ZFS Quotas and Reservations.

ZFS ACL Pass Through Inheritance for Execute Permission

Solaris 10 10/09 Release: In previous Solaris releases, you could apply ACL inheritance so that all files are created with 0664 or 0666 permissions. In this release, if you want to optionally include the execute bit from the file creation mode into the inherited ACL, you can set the aclinherit mode to pass the execute permission to the inherited ACL.

If aclinherit=passthrough-x is enabled on a ZFS dataset, you can include execute permission for an output file that is generated from cc or gcc compiler tools. If the inherited ACL does not include execute permission, then the executable output from the compiler won't be executable until you use the chmod command to change the file's permissions.

For more information, see Example 8–12.

ZFS Property Enhancements

Solaris 10 10/09 and Oracle Solaris 10 9/10: The following ZFS file system enhancements are included in these releases.

ZFS Log Device Recovery

Solaris 10 10/09 Release: In this release, ZFS identifies intent log failures in the zpool status command output. Fault Management Architecture (FMA) reports these errors as well. Both ZFS and FMA describe how to recover from an intent log failure.

For example, if the system shuts down abruptly before synchronous write operations are committed to a pool with a separate log device, you see messages similar to the following:


# zpool status -x
  pool: pool
 state: FAULTED
status: One or more of the intent logs could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
        or ignore the intent log records by running 'zpool clear'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool        FAULTED      0     0     0 bad intent log
          mirror    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
        logs        FAULTED      0     0     0 bad intent log
          c0t5d0    UNAVAIL      0     0     0 cannot open

You can resolve the log device failure in the following ways:

To recover from this error without replacing the failed log device, you can clear the error with the zpool clear command. In this scenario, the pool will operate in a degraded mode and the log records will be written to the main pool until the separate log device is replaced.

Consider using mirrored log devices to avoid the log device failure scenario.

Using Cache Devices in Your ZFS Storage Pool

Solaris 10 10/09 Release: In this release, when you create a pool, you can specify cache devices, which are used to cache storage pool data.

Cache devices provide an additional layer of caching between main memory and disk. Using cache devices provides the greatest performance improvement for random-read workloads of mostly static content.

One or more cache devices can be specified when the pool is created. For example:


# zpool create pool mirror c0t2d0 c0t4d0 cache c0t0d0
# zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
        cache
          c0t0d0    ONLINE       0     0     0

errors: No known data errors

After cache devices are added, they gradually fill with content from main memory. Depending on the size of your cache device, it could take over an hour for the device to fill. Capacity and reads can be monitored by using the zpool iostat command as follows:


# zpool iostat -v pool 5

Cache devices can be added or removed from a pool after the pool is created.

For more information, see Creating a ZFS Storage Pool With Cache Devices and Example 4–4.

Zone Migration in a ZFS Environment

Solaris 10 5/09 Release: This release extends support for migrating zones in a ZFS environment with Oracle Solaris Live Upgrade. For more information, see Using Oracle Solaris Live Upgrade to Migrate or Upgrade a System With Zones (at Least Solaris 10 5/09).

For a list of known issues with this release, see the Solaris 10 5/09 release notes.

ZFS Installation and Boot Support

Solaris 10 10/08 Release: This release enables you to install and boot a ZFS root file system. You can use the initial installation option or the JumpStart feature to install a ZFS root file system. Or, you can use Oracle Solaris Live Upgrade to migrate a UFS root file system to a ZFS root file system. ZFS support for swap and dump devices is also provided. For more information, see Chapter 5, Installing and Booting an Oracle Solaris ZFS Root File System.

For a list of known issues with this release, go to the following site:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/boot

Also, see the Solaris 10 10/08 release notes.

Rolling Back a Dataset Without Unmounting

Solaris 10 10/08 Release: This release enables you to roll back a dataset without unmounting it first. Thus, the zfs rollback -f option is no longer needed to force an unmount operation. The -f option is no longer supported and is ignored, if specified.

Enhancements to the zfs send Command

Solaris 10 10/08 Release: This release includes the following enhancements to the zfs send command. Using this command, you can now perform the following tasks:

For more information, see Sending and Receiving Complex ZFS Snapshot Streams.

ZFS Quotas and Reservations for File System Data Only

Solaris 10 10/08 Release : In addition to the existing ZFS quota and reservation features, this release includes dataset quotas and reservations that do not include descendents, such as snapshots and clones, in the disk space accounting.

For example, you can set a 10-GB refquota limit for studentA that sets a 10-GB hard limit of referenced disk space. For additional flexibility, you can set a 20-GB quota that enables you to manage studentA's snapshots.


# zfs set refquota=10g tank/studentA
# zfs set quota=20g tank/studentA

For more information, see Setting ZFS Quotas and Reservations.

ZFS Storage Pool Properties

Solaris 10 10/08 Release: ZFS storage pool properties were introduced in an earlier release. This release provides two properties, cachefile and failmode.

The following describes the new storage pool properties in this release:

ZFS Command History Enhancements (zpool history)

Solaris 10 10/08 Release: The zpool history command has been enhanced to provide the following new features:

For more information about using the zpool history command, see Resolving Problems With ZFS.

Upgrading ZFS File Systems (zfs upgrade)

Solaris 10 10/08 Release: The zfs upgrade command is included in this release to provide future ZFS file system enhancements to existing file systems. ZFS storage pools have a similar upgrade feature to provide pool enhancements to existing storage pools.

For example:


# zfs upgrade
This system is currently running ZFS filesystem version 3.

All filesystems are formatted with the current version.

Note –

File systems that are upgraded and any streams created from those upgraded file systems by the zfs send command are not accessible on systems that are running older software releases.


ZFS Delegated Administration

Solaris 10 10/08 Release: In this release, you can grant fine-grained permissions to allow nonprivileged users to perform ZFS administration tasks.

You can use the zfs allow and zfs unallow commands to delegate and remove permissions.

You can modify delegated administration with the pool's delegation property. For example:


# zpool get delegation users
NAME  PROPERTY    VALUE       SOURCE
users  delegation  on          default
# zpool set delegation=off users
# zpool get delegation users
NAME  PROPERTY    VALUE       SOURCE
users  delegation  off         local

By default, the delegation property is enabled.

For more information, see Chapter 9, Oracle Solaris ZFS Delegated Administration and zfs(1M).

Setting Up Separate ZFS Log Devices

Solaris 10 10/08 Release: The ZFS intent log (ZIL) is provided to satisfy POSIX requirements for synchronous transactions. For example, databases often require their transactions to be on stable storage devices when returning from a system call. NFS and other applications can also use fsync() to ensure data stability. By default, the ZIL is allocated from blocks within the main storage pool. In this Solaris release, you can decide if you want the ZIL blocks to continue to be allocated from the main storage pool or from a separate log device. Better performance might be possible by using separate intent log devices in your ZFS storage pool, such as with NVRAM or a dedicated disk.

Log devices for the ZFS intent log are not related to database log files.

You can set up a ZFS log device when the storage pool is created or after the pool is created. For examples of setting up log devices, see Creating a ZFS Storage Pool With Log Devices and Adding Devices to a Storage Pool.

You can attach a log device to an existing log device to create a mirrored log device. This operation is identical to attaching a device in a unmirrored storage pool.

Consider the following points when determining whether setting up a ZFS log device is appropriate for your environment:

Creating Intermediate ZFS Datasets

Solaris 10 10/08 Release: You can use the -p option with the zfs create, zfs clone, and zfs rename commands to quickly create a non-existent intermediate dataset, if it doesn't already exist.

In the following example, ZFS datasets (users/area51) are created in the datab storage pool.


# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
datab                       106K  16.5G    18K  /datab
# zfs create -p -o compression=on datab/users/area51

If the intermediate dataset already exists during the create operation, the operation completes successfully.

Properties specified apply to the target dataset, not to the intermediate dataset. For example:


# zfs get mountpoint,compression datab/users/area51
NAME                PROPERTY     VALUE                SOURCE
datab/users/area51  mountpoint   /datab/users/area51  default
datab/users/area51  compression  on                   local

The intermediate dataset is created with the default mount point. Any additional properties are disabled for the intermediate dataset. For example:


# zfs get mountpoint,compression datab/users
NAME         PROPERTY     VALUE         SOURCE
datab/users  mountpoint   /datab/users  default
datab/users  compression  off           default

For more information, see zfs(1M).

ZFS Hot-Plugging Enhancements

Solaris 10 10/08 Release: In this release, ZFS more effectively responds to devices that are removed and can now automatically identify devices that are inserted.

For more information, see zpool(1M).

Recursively Renaming ZFS Snapshots (zfs rename -r)

Solaris 10 10/08 Release: You can recursively rename all descendent ZFS snapshots by using the zfs rename -r command. For example:

First, a snapshot of a set of ZFS file systems is created.


# zfs snapshot -r users/home@today
# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
users                    216K  16.5G    20K  /users
users/home                76K  16.5G    22K  /users/home
users/home@today            0      -    22K  -
users/home/markm          18K  16.5G    18K  /users/home/markm
users/home/markm@today      0      -    18K  -
users/home/marks          18K  16.5G    18K  /users/home/marks
users/home/marks@today      0      -    18K  -
users/home/neil           18K  16.5G    18K  /users/home/neil
users/home/neil@today       0      -    18K  -

Then, the snapshots are renamed the following day.


# zfs rename -r users/home@today @yesterday
# zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
users                        216K  16.5G    20K  /users
users/home                    76K  16.5G    22K  /users/home
users/home@yesterday            0      -    22K  -
users/home/markm              18K  16.5G    18K  /users/home/markm
users/home/markm@yesterday      0      -    18K  -
users/home/marks              18K  16.5G    18K  /users/home/marks
users/home/marks@yesterday      0      -    18K  -
users/home/neil               18K  16.5G    18K  /users/home/neil
users/home/neil@yesterday       0      -    18K  -

A snapshot is the only type of dataset that can be renamed recursively.

For more information about snapshots, see Overview of ZFS Snapshots and this blog entry that describes how to create rolling snapshots:

http://blogs.sun.com/mmusante/entry/rolling_snapshots_made_easy

gzip Compression Is Available for ZFS

Solaris 10 10/08 Release: In this Solaris release, you can set gzip compression on ZFS file systems, in addition to lzjb compression. You can specify compression as gzip, or gzip-N, where N equals 1 through 9. For example:


# zfs create -o compression=gzip users/home/snapshots
# zfs get compression users/home/snapshots
NAME                  PROPERTY     VALUE            SOURCE
users/home/snapshots  compression  gzip             local
# zfs create -o compression=gzip-9 users/home/oldfiles
# zfs get compression users/home/oldfiles
NAME                  PROPERTY     VALUE           SOURCE
users/home/oldfiles   compression  gzip-9          local

For more information about setting ZFS properties, see Setting ZFS Properties.

Storing Multiple Copies of ZFS User Data

Solaris 10 10/08 Release: As a reliability feature, ZFS file system metadata is automatically stored multiple times across different disks, if possible. This feature is known as ditto blocks.

In this Solaris release, you can also store multiple copies of user data is also stored per file system by using the zfs set copies command. For example:


# zfs set copies=2 users/home
# zfs get copies users/home
NAME        PROPERTY  VALUE       SOURCE
users/home  copies    2           local

Available values are 1, 2, or 3. The default value is 1. These copies are in addition to any pool-level redundancy, such as in a mirrored or RAID-Z configuration.

The benefits of storing multiple copies of ZFS user data are as follows:


Note –

Depending on the allocation of the ditto blocks in the storage pool, multiple copies might be placed on a single disk. A subsequent full disk failure might cause all ditto blocks to be unavailable.


You might consider using ditto blocks when you accidentally create a non-redundant pool and when you need to set data retention policies.

For a detailed description of how storing multiple copies on a system with a single-disk pool or a multiple-disk pool might impact overall data protection, see this blog:

http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection

For more information about setting ZFS properties, see Setting ZFS Properties.

Improved zpool status Output

Solaris 10 8/07 Release: You can use the zpool status -v command to display a list of files with persistent errors. Previously, you had to use the find -inum command to identify the file names from the list of displayed inodes.

For more information about displaying a list of files with persistent errors, see Repairing a Corrupted File or Directory.

ZFS and Solaris iSCSI Improvements

Solaris 10 8/07 Release: In this Solaris release, you can create a ZFS volume as a Solaris iSCSI target device by setting the shareiscsi property on the ZFS volume. This method is a convenient way to quickly set up a Solaris iSCSI target. For example:


# zfs create -V 2g tank/volumes/v2
# zfs set shareiscsi=on tank/volumes/v2
# iscsitadm list target
Target: tank/volumes/v2
    iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-cf9a72aa062a
    Connections: 0

After the iSCSI target is created, you can set up the iSCSI initiator. For information about setting up a Solaris iSCSI initiator, see Chapter 14, Configuring Oracle Solaris iSCSI Targets and Initiators (Tasks), in System Administration Guide: Devices and File Systems.

For more information about managing a ZFS volume as an iSCSI target, see Using a ZFS Volume as a Solaris iSCSI Target.

ZFS Command History (zpool history)

Solaris 10 8/07 Release: In this Solaris release, ZFS automatically logs successful zfs and zpool commands that modify pool state information. For example:


# zpool history
History for 'newpool':
2007-04-25.11:37:31 zpool create newpool mirror c0t8d0 c0t10d0
2007-04-25.11:37:46 zpool replace newpool c0t10d0 c0t9d0
2007-04-25.11:38:04 zpool attach newpool c0t9d0 c0t11d0
2007-04-25.11:38:09 zfs create newpool/user1
2007-04-25.11:38:15 zfs destroy newpool/user1

History for 'tank':
2007-04-25.11:46:28 zpool create tank mirror c1t0d0 c2t0d0 mirror c3t0d0 c4t0d0

This features enables you or Oracle support personnel to identify the actual ZFS commands that were executed to troubleshoot an error scenario.

You can identify a specific storage pool with the zpool history command. For example:


# zpool history newpool
History for 'newpool':
2007-04-25.11:37:31 zpool create newpool mirror c0t8d0 c0t10d0
2007-04-25.11:37:46 zpool replace newpool c0t10d0 c0t9d0
2007-04-25.11:38:04 zpool attach newpool c0t9d0 c0t11d0
2007-04-25.11:38:09 zfs create newpool/user1
2007-04-25.11:38:15 zfs destroy newpool/user1

In this Solaris release, the zpool history command does not record user-ID, hostname, or zone-name. However, this information is recorded starting in the Solaris 10 10/08 release. For more information, see ZFS Command History Enhancements (zpool history).

For more information about troubleshooting ZFS problems, see Resolving Problems With ZFS.

ZFS Property Improvements

ZFS xattr Property

Solaris 10 8/07 Release: You can use the xattr property to disable or enable extended attributes for a specific ZFS file system. The default value is on. For a description of ZFS properties, see Introducing ZFS Properties.

ZFS canmount Property

Solaris 10 8/07 Release: The new canmount property enables you to specify whether a dataset can be mounted by using the zfs mount command. For more information, see canmount Property.

ZFS User Properties

Solaris 10 8/07 Release: In addition to the standard native properties that can be used to either export internal statistics or control ZFS file system behavior, ZFS provides user properties. User properties have no effect on ZFS behavior, but you can use them to annotate datasets with information that is meaningful in your environment.

For more information, see ZFS User Properties.

Setting Properties When Creating ZFS File Systems

Solaris 10 8/07 Release: In this Solaris release, you can set properties when you create a file system, not just after the file system is created.

The following examples illustrate equivalent syntax:


# zfs create tank/home
# zfs set mountpoint=/export/zfs tank/home
# zfs set sharenfs=on tank/home
# zfs set compression=on tank/home

# zfs create -o mountpoint=/export/zfs -o sharenfs=on -o compression=on tank/home

Displaying All ZFS File System Information

Solaris 10 8/07 Release: In this Solaris release, you can use various forms of the zfs get command to display information about all datasets if you do not specify a dataset or if you specify all. In previous releases, all dataset information was not retreivable with the zfs get command.

For example:


# zfs get -s local all
tank/home               atime          off                    local
tank/home/bonwick       atime          off                    local
tank/home/marks         quota          50G                    local

New zfs receive -F Option

Solaris 10 8/07 Release: In this Solaris release, you can use the new -F option to the zfs receive command to force a rollback of the file system to the most recent snapshot before the receive is initiated. Using this option might be necessary when the file system is modified after a rollback occurs but before the receive is initiated.

For more information, see Receiving a ZFS Snapshot.

Recursive ZFS Snapshots

Solaris 10 11/06 Release: When you use the zfs snapshot command to create a file system snapshot, you can use the -r option to recursively create snapshots for all descendent file systems. In addition, you can use the -r option to recursively destroy all descendent snapshots when a snapshot is destroyed.

Recursive ZFS snapshots are created quickly as one atomic operation. The snapshots are created together (all at once) or not created at all. The benefit of such an operation is that the snapshot data is always taken at one consistent time, even across descendent file systems.

For more information, see Creating and Destroying ZFS Snapshots.

Double-Parity RAID-Z (raidz2)

Solaris 10 11/06 Release: A redundant RAID-Z configuration can now have either a single- or double-parity configuration, which means that one or two device failures, respectively, can be sustained, without any data loss. You can specify the raidz2 keyword for a double-parity RAID-Z configuration. Or, you can specify the raidz or raidz1 keyword for a single-parity RAID-Z configuration.

For more information, see Creating a RAID-Z Storage Pool or zpool(1M).

Hot Spares for ZFS Storage Pool Devices

Solaris 10 11/06 Release: The ZFS hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in one or more storage pools. Designating a device as a hot spare means that if an active device in the pool fails, the hot spare automatically replaces the failed device. Or, you can manually replace a device in a storage pool with a hot spare.

For more information, see Designating Hot Spares in Your Storage Pool and zpool(1M).

Replacing a ZFS File System With a ZFS Clone (zfs promote)

Solaris 10 11/06 Release: The zfs promote command enables you to replace an existing ZFS file system with a clone of that file system. This feature is helpful when you want to run tests on an alternative version of a file system and then make that alternative version the active file system.

For more information, see Replacing a ZFS File System With a ZFS Clone and zfs(1M).

Upgrading ZFS Storage Pools (zpool upgrade)

Solaris 10 6/06 Release: You can upgrade your storage pools to a newer version of ZFS to take advantage of the latest features by using the zpool upgrade command. In addition, the zpool status command has been modified to notify you when your pools are running older versions of ZFS.

For more information, see Upgrading ZFS Storage Pools and zpool(1M).

If you want to use the ZFS Administration console on a system with a pool from a previous Solaris release, ensure that you upgrade your pools before using the console. To determine if your pools need to be upgraded, use the zpool status command. For information about the ZFS Administration console, see ZFS Web-Based Management.

ZFS Backup and Restore Commands Are Renamed

Solaris 10 6/06 Release: In this Solaris release, the zfs backup and zfs restore commands are renamed to zfs send and zfs receive to more accurately describe their functions. These commands send and receive ZFS data stream representations.

For more information about these commands, see Sending and Receiving ZFS Data.

Recovering Destroyed Storage Pools

Solaris 10 6/06 Release: This release includes the zpool import -D command, which enables you to recover pools that were previously destroyed with the zpool destroy command.

For more information, see Recovering Destroyed ZFS Storage Pools.

ZFS Is Integrated With Fault Manager

Solaris 10 6/06 Release: This release includes a ZFS diagnostic engine that is capable of diagnosing and reporting pool failures and device failures. Checksum, I/O, device, and pool errors associated with pool or device failures are also reported.

The diagnostic engine does not include predictive analysis of checksum and I/O errors, nor does it include proactive actions based on fault analysis.

If a ZFS failure occurs, you might see a message similar to the following:


SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Wed Jun 30 14:53:39 MDT 2010
PLATFORM: SUNW,Sun-Fire-880, CSN: -, HOSTNAME: neo
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 504a1188-b270-4ab0-af4e-8a77680576b8
DESC: A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for more information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.

By reviewing the recommended action, which is to follow the more specific directions in the zpool status command, you can quickly identify and resolve the failure.

For an example of recovering from a reported ZFS problem, see Resolving a Missing Device.

The zpool clear Command

Solaris 10 6/06 Release: This release includes the zpool clear command for clearing error counts associated with a device or a pool. Previously, error counts were cleared when a device in a pool was brought online with the zpool online command. For more information, see Clearing Storage Pool Device Errors and zpool(1M).

Compact NFSv4 ACL Format

Solaris 10 6/06 Release: In this release, you can set and display NFSv4 ACLs in two formats: verbose and compact. You can use the chmod command to set either ACL formats. You can use the ls -V command to display the compact ACL format. You can use the ls -v command to display the verbose ACL format.

For more information, see Setting and Displaying ACLs on ZFS Files in Compact Format, chmod(1), and ls(1).

File System Monitoring Tool (fsstat)

Solaris 10 6/06 Release: A new file system monitoring tool, fsstat, reports file system operations. Activity can be reported by mount point or by file system type. The following example shows general ZFS file system activity:


$ fsstat zfs
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
7.82M 5.92M 2.76M 1.02G 3.32M  5.60G 87.0M  363M 1.86T 20.9M  251G zfs

For more information, see fsstat(1M).

ZFS Web-Based Management

Solaris 10 6/06 Release: A web-based ZFS management tool, the ZFS Administration console, enables you to perform the following administrative tasks:

You can access the ZFS Administration console through a secure web browser at:


https://system-name:6789/zfs

If you type the appropriate URL and are unable to reach the ZFS Administration console, the server might not be started. To start the server, run the following command:


# /usr/sbin/smcwebserver start

If you want the server to run automatically when the system boots, run the following command:


# /usr/sbin/smcwebserver enable

Note –

You cannot use the Solaris Management Console (smc) to manage ZFS storage pools or file systems.


What Is ZFS?

The ZFS file system is a revolutionary new file system that fundamentally changes the way file systems are administered, with features and benefits not found in any other file system available today. ZFS is robust, scalable, and easy to administer.

ZFS Pooled Storage

ZFS uses the concept of storage pools to manage physical storage. Historically, file systems were constructed on top of a single physical device. To address multiple devices and provide for data redundancy, the concept of a volume manager was introduced to provide a representation of a single device so that file systems would not need to be modified to take advantage of multiple devices. This design added another layer of complexity and ultimately prevented certain file system advances because the file system had no control over the physical placement of data on the virtualized volumes.

ZFS eliminates volume management altogether. Instead of forcing you to create virtualized volumes, ZFS aggregates devices into a storage pool. The storage pool describes the physical characteristics of the storage (device layout, data redundancy, and so on) and acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual devices, allowing them to share disk space with all file systems in the pool. You no longer need to predetermine the size of a file system, as file systems grow automatically within the disk space allocated to the storage pool. When new storage is added, all file systems within the pool can immediately use the additional disk space without additional work. In many ways, the storage pool works similarly to a virtual memory system: When a memory DIMM is added to a system, the operating system doesn't force you to run commands to configure the memory and assign it to individual processes. All processes on the system automatically use the additional memory.

Transactional Semantics

ZFS is a transactional file system, which means that the file system state is always consistent on disk. Traditional file systems overwrite data in place, which means that if the system loses power, for example, between the time a data block is allocated and when it is linked into a directory, the file system will be left in an inconsistent state. Historically, this problem was solved through the use of the fsck command. This command was responsible for reviewing and verifying the file system state, and attempting to repair any inconsistencies during the process. This problem of inconsistent file systems caused great pain to administrators, and the fsck command was never guaranteed to fix all possible problems. More recently, file systems have introduced the concept of journaling. The journaling process records actions in a separate journal, which can then be replayed safely if a system crash occurs. This process introduces unnecessary overhead because the data needs to be written twice, often resulting in a new set of problems, such as when the journal cannot be replayed properly.

With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. Thus, the file system can never be corrupted through accidental loss of power or a system crash. Although the most recently written pieces of data might be lost, the file system itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost.

Checksums and Self-Healing Data

With ZFS, all data and metadata is verified using a user-selectable checksum algorithm. Traditional file systems that do provide checksum verification have performed it on a per-block basis, out of necessity due to the volume management layer and traditional file system design. The traditional design means that certain failures, such as writing a complete block to an incorrect location, can result in data that is incorrect but has no checksum errors. ZFS checksums are stored in a way such that these failures are detected and can be recovered from gracefully. All checksum verification and data recovery are performed at the file system layer, and are transparent to applications.

In addition, ZFS provides for self-healing data. ZFS supports storage pools with varying levels of data redundancy. When a bad data block is detected, ZFS fetches the correct data from another redundant copy and repairs the bad data, replacing it with the correct data.

Unparalleled Scalability

A key design element of the ZFS file system is scalability. The file system itself is 128 bit, allowing for 256 quadrillion zettabytes of storage. All metadata is allocated dynamically, so no need exists to preallocate inodes or otherwise limit the scalability of the file system when it is first created. All the algorithms have been written with scalability in mind. Directories can have up to 248 (256 trillion) entries, and no limit exists on the number of file systems or the number of files that can be contained within a file system.

ZFS Snapshots

A snapshot is a read-only copy of a file system or volume. Snapshots can be created quickly and easily. Initially, snapshots consume no additional disk space within the pool.

As data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data. As a result, the snapshot prevents the data from being freed back to the pool.

Simplified Administration

Most importantly, ZFS provides a greatly simplified administration model. Through the use of a hierarchical file system layout, property inheritance, and automatic management of mount points and NFS share semantics, ZFS makes it easy to create and manage file systems without requiring multiple commands or the editing configuration files. You can easily set quotas or reservations, turn compression on or off, or manage mount points for numerous file systems with a single command. You can examine or replace devices without learning a separate set of volume manager commands. You can send and receive file system snapshot streams.

ZFS manages file systems through a hierarchy that allows for this simplified management of properties such as quotas, reservations, compression, and mount points. In this model, file systems are the central point of control. File systems themselves are very cheap (equivalent to creating a new directory), so you are encouraged to create a file system for each user, project, workspace, and so on. This design enables you to define fine-grained management points.

ZFS Terminology

This section describes the basic terminology used throughout this book:

alternate boot environment

A boot environment that is created by the lucreate command and possibly updated by the luupgrade command, but it is not the active or primary boot environment. The alternate boot environment can become the primary boot environment by running the luactivate command.

checksum

A 256-bit hash of the data in a file system block. The checksum capability can range from the simple and fast fletcher4 (the default) to cryptographically strong hashes such as SHA256.

clone

A file system whose initial contents are identical to the contents of a snapshot.

For information about clones, see Overview of ZFS Clones.

dataset

A generic name for the following ZFS components: clones, file systems, snapshots, and volumes.

Each dataset is identified by a unique name in the ZFS namespace. Datasets are identified using the following format:

pool/path[@snapshot]

pool

Identifies the name of the storage pool that contains the dataset

path

Is a slash-delimited path name for the dataset component

snapshot

Is an optional component that identifies a snapshot of a dataset

For more information about datasets, see Chapter 6, Managing Oracle Solaris ZFS File Systems.

file system

A ZFS dataset of type filesystem that is mounted within the standard system namespace and behaves like other file systems.

For more information about file systems, see Chapter 6, Managing Oracle Solaris ZFS File Systems.

mirror

A virtual device that stores identical copies of data on two or more disks. If any disk in a mirror fails, any other disk in that mirror can provide the same data.

pool

A logical group of devices describing the layout and physical characteristics of the available storage. Disk space for datasets is allocated from a pool.

For more information about storage pools, see Chapter 4, Managing Oracle Solaris ZFS Storage Pools.

primary boot environment

A boot environment that is used by the lucreate command to build the alternate boot environment. By default, the primary boot environment is the current boot environment. This default can be overridden by using the lucreate -s option.

RAID-Z

A virtual device that stores data and parity on multiple disks. For more information about RAID-Z, see RAID-Z Storage Pool Configuration.

resilvering

The process of copying data from one device to another device is known as resilvering. For example, if a mirror device is replaced or taken offline, the data from an up-to-date mirror device is copied to the newly restored mirror device. This process is referred to as mirror resynchronization in traditional volume management products.

For more information about ZFS resilvering, see Viewing Resilvering Status.

snapshot

A read-only copy of a file system or volume at a given point in time.

For more information about snapshots, see Overview of ZFS Snapshots.

virtual device

A logical device in a pool, which can be a physical device, a file, or a collection of devices.

For more information about virtual devices, see Displaying Storage Pool Virtual Device Information.

volume

A dataset that represents a block device. For example, you can create a ZFS volume as a swap device.

For more information about ZFS volumes, see ZFS Volumes.

ZFS Component Naming Requirements

Each ZFS component, such as datasets and pools, must be named according to the following rules:

In addition, empty components are not allowed.