Skip Navigation Links | |
Exit Print View | |
Oracle Solaris ZFS Administration Guide Oracle Solaris 11 Express 11/10 |
1. Oracle Solaris ZFS File System (Introduction)
2. Getting Started With Oracle Solaris ZFS
3. Oracle Solaris ZFS and Traditional File System Differences
4. Managing Oracle Solaris ZFS Storage Pools
5. Managing ZFS Root Pool Components
6. Managing Oracle Solaris ZFS File Systems
7. Working With Oracle Solaris ZFS Snapshots and Clones
8. Using ACLs and Attributes to Protect Oracle Solaris ZFS Files
9. Oracle Solaris ZFS Delegated Administration
10. Oracle Solaris ZFS Advanced Topics
11. Oracle Solaris ZFS Troubleshooting and Pool Recovery
Missing Devices in a ZFS Storage Pool
Damaged Devices in a ZFS Storage Pool
Determining If Problems Exist in a ZFS Storage Pool
Overall Pool Status Information
Pool Configuration Information
System Reporting of ZFS Error Messages
Repairing a Damaged ZFS Configuration
Physically Reattaching a Device
Notifying ZFS of Device Availability
Replacing or Repairing a Damaged Device
Determining the Type of Device Failure
Replacing a Device in a ZFS Storage Pool
Determining If a Device Can Be Replaced
Devices That Cannot be Replaced
Replacing a Device in a ZFS Storage Pool
Identifying the Type of Data Corruption
Repairing a Corrupted File or Directory
Repairing ZFS Storage Pool-Wide Damage
Repairing an Unbootable System
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation.
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.
The fsck utility repairs known problems specific to UFS file systems. Most ZFS storage pool problems are generally related to failing hardware or power failures. Many problems can be avoided by using redundant pools. If your pool is damaged due to failing hardware or a power outage, see Repairing ZFS Storage Pool-Wide Damage.
If your pool is not redundant, the risk that file system corruption can render some or all of your data inaccessible is always present.
In addition to performing file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task requires unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This feature, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in a hardware or software failure.
Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can obtain quick overview of all known errors within the pool.
The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool's data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrubbing operation can be displayed by using the zpool status command. For example:
# zpool status -v tank pool: tank state: ONLINE scan: scrub in progress since Mon Jun 7 12:07:52 2010 201M scanned out of 222M at 9.55M/s, 0h0m to go 0 repaired, 90.44% done config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Only one active scrubbing operation per pool can occur at one time.
You can stop a scrubbing operation that is in progress by using the -s option. For example:
# zpool scrub -s tank
In most cases, a scrubing operation to ensure data integrity should continue to completion. Stop a scrubbing operation at your own discretion if system performance is impacted by the operation.
Performing routine scrubbing guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can occur at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing and restarts it after the resilvering is completed.
For more information about resilvering, see Viewing Resilvering Status.