JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris ZFS Administration Guide
search filter icon
search icon

Document Information

Preface

1.  Oracle Solaris ZFS File System (Introduction)

2.  Getting Started With Oracle Solaris ZFS

3.  Oracle Solaris ZFS and Traditional File System Differences

4.  Managing Oracle Solaris ZFS Storage Pools

5.  Installing and Booting an Oracle Solaris ZFS Root File System

6.  Managing Oracle Solaris ZFS File Systems

7.  Working With Oracle Solaris ZFS Snapshots and Clones

8.  Using ACLs to Protect Oracle Solaris ZFS Files

9.  Oracle Solaris ZFS Delegated Administration

10.  Oracle Solaris ZFS Advanced Topics

11.  Oracle Solaris ZFS Troubleshooting and Pool Recovery

Identifying ZFS Failures

Missing Devices in a ZFS Storage Pool

Damaged Devices in a ZFS Storage Pool

Corrupted ZFS Data

Checking ZFS File System Integrity

File System Repair

File System Validation

Controlling ZFS Data Scrubbing

Explicit ZFS Data Scrubbing

ZFS Data Scrubbing and Resilvering

Resolving Problems With ZFS

Determining If Problems Exist in a ZFS Storage Pool

Reviewing zpool status Output

Overall Pool Status Information

Pool Configuration Information

Scrubbing Status

Data Corruption Errors

System Reporting of ZFS Error Messages

Repairing a Damaged ZFS Configuration

Resolving a Missing Device

Physically Reattaching a Device

Notifying ZFS of Device Availability

Replacing or Repairing a Damaged Device

Determining the Type of Device Failure

Clearing Transient Errors

Replacing a Device in a ZFS Storage Pool

Determining If a Device Can Be Replaced

Devices That Cannot be Replaced

Replacing a Device in a ZFS Storage Pool

Viewing Resilvering Status

Repairing Damaged Data

Identifying the Type of Data Corruption

Repairing a Corrupted File or Directory

Repairing ZFS Storage Pool-Wide Damage

Repairing an Unbootable System

A.  Oracle Solaris ZFS Version Descriptions

Index

Identifying ZFS Failures

As a combined file system and volume manager, ZFS can exhibit many different failures. This chapter begins by outlining the various failures, then discusses how to identify them on a running system. This chapter concludes by discussing how to repair the problems. ZFS can encounter three basic types of errors:

Note that a single pool can experience all three errors, so a complete repair procedure involves finding and correcting one error, proceeding to the next error, and so on.

Missing Devices in a ZFS Storage Pool

If a device is completely removed from the system, ZFS detects that the device cannot be opened and places it in the REMOVED state. Depending on the data replication level of the pool, this removal might or might not result in the entire pool becoming unavailable. If one disk in a mirrored or RAID-Z device is removed, the pool continues to be accessible. A pool might become FAULTED, which means no data is accessible until the device is reattached, under the following conditions:

Damaged Devices in a ZFS Storage Pool

The term “damaged” covers a wide variety of possible errors. Examples include the following:

In some cases, these errors are transient, such as a random I/O error while the controller is having problems. In other cases, the damage is permanent, such as on-disk corruption. Even still, whether the damage is permanent does not necessarily indicate that the error is likely to occur again. For example, if an administrator accidentally overwrites part of a disk, no type of hardware failure has occurred, and the device does not need to be replaced. Identifying the exact problem with a device is not an easy task and is covered in more detail in a later section.

Corrupted ZFS Data

Data corruption occurs when one or more device errors (indicating one or more missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data is the result.

Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever. Most often, this scenario requires restoring data from backups. Data errors are recorded as they are encountered, and they can be controlled through routine pool scrubbing as explained in the following section. When a corrupted block is removed, the next scrubbing pass recognizes that the corruption is no longer present and removes any trace of the error from the system.