Understanding Snapshots
Snapshots present an interesting dilemma for space management. They represent the set of
physical blocks referenced by a share at a given point in time. Initially, this snapshot consumes no
additional space. But as new data is overwritten in the new share, the blocks in the active share
will only contain the new data, and older blocks will be "held" by the most recent (and possibly
older) snapshots. Gradually, snapshots can consume additional space as the content diverges in the
active share.
Some other systems will try to hide the cost of snapshots, by pretending that they are free,
or by "reserving" space dedicated to holding snapshot data. Such systems try to gloss over the basic
fact inherent with snapshots. If you take a snapshot of a filesystem of any given size, and re-write
100% of the data within the filesystem, by definition you must maintain references to twice the data
as was originally in the filesystem. Snapshots are not free, and the only way other systems can
present this abstraction is to silently destroy snapshots when space gets full. This can often be
the absolute worst thing to do, as a process run amok rewriting data can cause all previous
snapshots to be destroyed, preventing any restoration in the process.
In the Sun Storage 7000 series, the cost of snapshots is always explicit, and tools are
provided to manage this space in a way that best matches the administrative model for a given
environment. Each snapshot has two associated space statistics: unique space and referenced space.
The amount of referenced space is the total space consumed by the filesystem at the time the
snapshot was taken. It represents the theoretical maximum size of the snapshot should it remain the
sole reference to all data blocks. The unique space indicates the amount of physical space
referenced only by the current snapshot. When a snapshot is destroyed, the unique space will be made
available to the rest of the pool. Note that the amount of space consumed by all snapshots is not
equivalent to the sum of unique space across all snapshots. With a share and a single snapshot, all
blocks must be referenced by one or both of the snapshot or the share. With multiple snapshots,
however, it's possible for a block to be referenced by some subset of snapshots, and not any
particular snapshot. For example, if a file is created, two snapshots X and Y are taken, the file is
deleted, and another snapshot Z is taken, the blocks within the file are held by X and Y, but not by
Z. In this case, destroying Z will not free up the space, but destroying both X and Y will. Because
of this, destroying any snapshot can affect the unique space referenced by neighboring snapshots,
though the total amount of space consumed by snapshots will always decrease.
The total size of a project or share always accounts for space consumed by all snapshots,
though the usage breakdown is also available. Quotas and reservations can be set at the project
level to enforce physical constraints across this total space. In addition, quotas and reservations
can be set at the filesystem level, and these settings can apply to only referenced data or total
data. Whether or not quotas and reservations should be applied to referenced data or total physical
data depends on the administrative environment. If users are not in control of their snapshots (i.e.
an automatic snapshot schedule is set for them), then quotas should typically not include snapshots
in the calculation. Otherwise, the user may run out of space but be confused when files cannot be
deleted. Without an understanding of snapshots or means to manage those snapshots, it is possible
for such a situation to be unrecoverable without administrator intervention. In this scenario, the
snapshots represent an overhead cost that is factored into operation of the system in order to
provide backup capabilities. On the other hand, there are environments where users are billed
according to their physical space requirements, and snapshots represent a choice by the user to
provide some level of backup that meets their requirements given the churn rate of their dataset. In
these environments, it makes more sense to enforce quotas based on total physical data, including
snapshots. The users understand the cost of snapshots, and can be provided a means to actively
management them (as through dedicated roles on the ZFSSA).