Managing ZFS File Systems

Language:

The following ZFS file system features (not available in Oracle Solaris 10) are available in Oracle Solaris 11:

ZFS file system encryption – You can encrypt a ZFS file system when it is created. See Managing Security.
ZFS file system deduplication – For important information about determining whether your system environment can support ZFS data deduplication, see ZFS Data Deduplication Requirements.
ZFS file system sharing syntax changes – Includes both NFS and SMB file system sharing changes. SeeZFS File System Sharing Changes.
ZFS man page changes – The zfs.1m man page has been revised so that core ZFS file system features remain in the zfs.1m page, but delegated administration, encryption, and share syntax and examples are covered in the following pages:
ZFS root pool setup simplified – Support for Unified Archives in Oracle Solaris makes root pool recovery setup much easier than in previous releases. See Using Unified Archives for System Recovery and Cloning in Oracle Solaris 11.3.
ZFS send stream monitoring – You can monitor the progress of a ZFS stream transmission in real time. See Monitoring ZFS Pool Operations in Managing ZFS File Systems in Oracle Solaris 11.3.
ZFS temporary pool names – You can create or import a pool with a temporary pool name in a shared storage or recovery scenario. See Importing a Pool With a Temporary Name in Managing ZFS File Systems in Oracle Solaris 11.3.

Displaying ZFS File System Information

After installing your system, review the ZFS storage pool and ZFS file system information.

Display ZFS storage pool information with the zpool status command.

Display ZFS file system information with the zfs list command.

See Reviewing the Initial BE After an Installation.

Resolving ZFS File System Space Reporting Issues

The zpool list and zfs list commands are more improved than the df and du commands for determining available pool and file system space. With the legacy commands, you cannot easily discern between pool and file system space, nor do the legacy commands account for space that is consumed by descendant file systems or snapshots.

For example, the following root pool (rpool) has 5.46 GB allocated and 68.5 GB free:

# zpool list rpool
NAME   SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool   74G  5.46G  68.5G   7%  1.00x  ONLINE  -

If you compare the pool space accounting with the file system space accounting by reviewing the USED columns of your individual file systems, you can see that the pool space is accounted for, as shown in the following example:

# zfs list -r rpool
NAME                      USED  AVAIL  REFER  MOUNTPOINT
rpool                    5.41G  67.4G  74.5K  /rpool
rpool/ROOT               3.37G  67.4G    31K  legacy
rpool/ROOT/solaris       3.37G  67.4G  3.07G  /
rpool/ROOT/solaris/var    302M  67.4G   214M  /var
rpool/dump               1.01G  67.5G  1000M  -
rpool/export             97.5K  67.4G    32K  /rpool/export
rpool/export/home        65.5K  67.4G    32K  /rpool/export/home
rpool/export/home/admin  33.5K  67.4G  33.5K  /rpool/export/home/admin
rpool/swap               1.03G  67.5G  1.00G  -

Resolving ZFS Storage Pool Space Reporting Issues

The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool, but varies depending on the pool's redundancy level. The zfs list command lists the usable space that is available to file systems, which is disk space minus ZFS pool redundancy metadata overhead, if any. See the following examples for more information.

Non-redundant storage pool – Created with one 136-GB disk, the zpool list command reports SIZE and initial FREE values as 136 GB. The initial AVAIL space that is reported by the zfs list command is 134 GB, which is due to a small amount of pool metadata overhead, as shown in the following example:
```
# zpool create tank c0t6d0
# zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   136G  95.5K   136G     0%  1.00x  ONLINE  -
# zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank    72K   134G    21K  /tank
```
Mirrored storage pool – Created with two 136-GB disks, the zpool list command reports SIZE as 136 GB and initial FREE value as 136 GB. This reporting is referred to as the deflated space value. The initial AVAIL space that is reported by the zfs list command is 134 GB, which is due to a small amount of pool metadata overhead, as shown in the following example:
```
# zpool create tank mirror c0t6d0 c0t7d0
# zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   136G  95.5K   136G     0%  1.00x  ONLINE  -
# zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank    72K   134G    21K  /tank
```
RAID-Z storage pool – Created with three 136-GB disks, the zpool list commands reports SIZE as 408 GB and initial FREE value as 408 GB. This reporting is referred to as the inflated disk space value, which includes redundancy overhead, such as parity information. The initial AVAIL space that is reported by the zfs list command is 133 GB, which is due to the pool redundancy overhead. The following example creates a RAIDZ-2 pool:
```
# zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0
# zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   408G   286K   408G     0%  1.00x  ONLINE  -
# zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  73.2K   133G  20.9K  /tank
```

Making ZFS File Systems Available

Making ZFS file systems available is similar to Oracle Solaris 10 releases in the following ways:

A ZFS file system is mounted automatically when it is created and then remounted automatically when the system is booted.
You do not have to modify the /etc/vfstab file to mount a ZFS file system, unless you create a legacy mount for ZFS file system. Mounting a ZFS file system automatically is recommended over using a legacy mount.
You do not have to modify the /etc/dfs/dfstab file to share file systems. See ZFS File System Sharing Changes.
Similar to a UFS root, the swap device must have an entry in the /etc/vfstab file.
You can share file systems between Oracle Solaris 10 systems and Oracle Solaris 11 systems by using NFS sharing.
You can share file systems between Oracle Solaris 11 systems by using NFS or SMB sharing.
You can export ZFS storage pools from an Oracle Solaris 10 system and then import them to an Oracle Solaris 11 system.

Monitoring File Systems

You can use the fsstat command to monitor file systems and report about file system operations. There are several options that report different kinds of activity. For example, you can display information by mount point or by file system type. In the following example, the fsstat command displays all of the ZFS file system operations from the time that the ZFS module was initially loaded:

$ fsstat zfs
new  name   name  attr  attr lookup rddir  read read  write write
file remov  chng   get   set    ops   ops   ops bytes   ops bytes
268K  145K 93.6K 28.0M 71.1K   186M 2.74M 12.9M 56.2G 1.61M 9.46G zfs

See fsstat(1M) for other examples.

Managing Memory Between ZFS and Applications

The user_reserve_hint_pct tunable parameter provides a hint to the system about application memory usage. This hint is used to limit the growth of ZFS Adaptive Replacement Cache (ARC) cache so that more memory can be made available for applications. For information about using this parameter, see Memory Management Between ZFS and Applications in Oracle Solaris 11.2 (Doc ID 1663862.1) at https://support.oracle.com/.

NFS `nfsmapid` Syntax Changes

The syntax for modifying the nfsmapid service that maps NFSv4 user and group IDs by using the passwd and group entries in the /etc/nsswitch.conf file has changed.

The nfsmapid service is as follows:

# svcs mapid
STATE          STIE    FMRI
online         Apr_25   svc:/network/nfs/mapid:default

You would modify the service instance as follows:

# svccfg -s svc:/network/nfs/mapid:default
svc:/network/nfs/mapid:default> listprop
nfs-props                          application
nfs-props/nfsmapid_domain         astring     fooold.com
general                            framework
general/complete                  astring
general/enabled                   boolean     false
restarter                          framework            NONPERSISTENT
restarter/logfile                 astring     /var/svc/log/network-nfs-mapid:default.log
restarter/contract                count       137
restarter/start_pid               count       1325
restarter/start_method_timestamp  time        1366921047.240441000
restarter/start_method_waitstatus integer     0
restarter/auxiliary_state         astring     dependencies_satisfied
restarter/next_state              astring     none
restarter/state                   astring     online
restarter/state_timestamp         time        1366921047.247849000
general_ovr                        framework            NONPERSISTENT
general_ovr/enabled               boolean     true
svc:/network/nfs/mapid:default> setprop nfs-props/nfsmapid_domain = newfoo.com
svc:/network/nfs/mapid:default> listprop
nfs-props                          application
nfs-props/nfsmapid_domain         astring     foonew.com
.
.
.
svc:/network/nfs/mapid:default> exit
# svcadm refresh svc:/network/nfs/mapid:default

ZFS File System Sharing Changes

In Oracle Solaris 10, you set the sharenfs or sharesmb property to create and publish a ZFS file system share. Or, you can use the legacy share command. In Oracle Solaris 11 11/11, file sharing was enhanced and the command syntax changed. See Legacy ZFS Sharing Syntax in Managing ZFS File Systems in Oracle Solaris 11.3 for details of this change.

Starting with Oracle Solaris 11.1, the following ZFS file sharing enhancements were made:

Share syntax is simplified. You share a file system by setting the share.nfs or share.smb property as follows:
```
# zfs set share.nfs=on tank/home
```
Support for better inheritance of share properties to descendent file systems. In the previous example, the share.nfs property value is inherited to any descendent file systems, as shown in the following example:
```
# zfs create tank/home/userA
# zfs create tank/home/userB
```
Specify additional property values or modify existing property values on any existing file system shares as follows:
```
# zfs set share.nfs.nosuid=on tank/home/userA
```
These additional file sharing improvements are associated with pool version 34. See New ZFS Sharing Syntax in Managing ZFS File Systems in Oracle Solaris 11.3.

ZFS Sharing Migration Issues

Review the following share transition issues:

Upgrading an Oracle Solaris 11 system to a subsequent Oracle Solaris 11 release – ZFS shares will be incorrect if you boot back to an older BE due to property changes in this release. Non-ZFS shares are unaffected. If you plan to boot back to an older BE, save a copy of the existing share configuration prior to the pkg update operation so that you can restore the share configuration on the ZFS datasets.
- In the older BE, use the sharemgr show -vp command to list all shares and their configuration.
- Use the zfs get sharenfs filesystem command and the zfs sharesmb filesystem commands to obtain the values of the sharing properties.
- If you boot back to an older BE, reset the sharenfs and sharesmb properties to their original values.
Legacy unsharing behavior – Using the unshare –a command or the unshareall command unpublishes a share, but the command does not update the SMF shares repository. If you try to re-share the existing share, the shares repository is checked for conflicts, and an error message is displayed.

ZFS Data Deduplication Requirements

You can use the deduplication (dedup) property to remove redundant data from your ZFS file systems. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored, and common components are shared between files, as shown in the following example:

# zfs set dedup=on tank/home

Do not enable the dedup property on file systems that reside on production systems until you perform the following steps to determine if your system can support data deduplication.

Determine if your data would benefit from deduplication space savings. If your data is not dedupable, there is no point in enabling dedup. Note that running the following command is very memory intensive:

# zdb -S tank
Simulated DDT histogram:
bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
1    2.27M    239G    188G    194G    2.27M    239G    188G    194G
2     327K   34.3G   27.8G   28.1G     698K   73.3G   59.2G   59.9G
4    30.1K   2.91G   2.10G   2.11G     152K   14.9G   10.6G   10.6G
8    7.73K    691M    529M    529M    74.5K   6.25G   4.79G   4.80G
16      673   43.7M   25.8M   25.9M    13.1K    822M    492M    494M
32      197   12.3M   7.02M   7.03M    7.66K    480M    269M    270M
64       47   1.27M    626K    626K    3.86K    103M   51.2M   51.2M
128       22    908K    250K    251K    3.71K    150M   40.3M   40.3M
256        7    302K     48K   53.7K    2.27K   88.6M   17.3M   19.5M
512        4    131K   7.50K   7.75K    2.74K    102M   5.62M   5.79M
2K        1      2K      2K      2K    3.23K   6.47M   6.47M   6.47M
8K        1    128K      5K      5K    13.9K   1.74G   69.5M   69.5M
Total    2.63M    277G    218G    225G    3.22M    337G    263G    270G

dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50

If the estimated dedup ratio is greater than 2, then you might see deduplication space savings.

In this example, the dedup ratio (dedup = 1.20) is less than 2, so enabling deduplication is discouraged.

Make sure your system has enough memory to support deduplication as follows:
- Each in-core deduplication table entry is approximately 320 bytes.
- Multiply the number of allocated blocks times 320:
```
in-core DDT size = 2.63M x 320 = 841.60M
```
Deduplication performance is best when the deduplication table fits into memory. If the table has to be written to disk, then performance will decrease. If you enable deduplication on your file systems without sufficient memory resources, system performance might degrade during file system related operations. For example, removing a large dedup-enabled file system without sufficient memory resources might impact system performance.

Transitioning From Oracle® Solaris 10 to Oracle Solaris 11.3