Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle Solaris ZFS Administration Guide Oracle Solaris 10 1/13 Information Library |
1. Oracle Solaris ZFS File System (Introduction)
2. Getting Started With Oracle Solaris ZFS
3. Managing Oracle Solaris ZFS Storage Pools
4. Installing and Booting an Oracle Solaris ZFS Root File System
5. Managing Oracle Solaris ZFS File Systems
6. Working With Oracle Solaris ZFS Snapshots and Clones
7. Using ACLs and Attributes to Protect Oracle Solaris ZFS Files
8. Oracle Solaris ZFS Delegated Administration
9. Oracle Solaris ZFS Advanced Topics
10. Oracle Solaris ZFS Troubleshooting and Pool Recovery
Identifying Problems With ZFS Storage Pools
Determining If Problems Exist in a ZFS Storage Pool
Overall Pool Status Information
ZFS Storage Pool Configuration Information
ZFS Storage Pool Scrubbing Status
Resolving ZFS Storage Device Problems
Resolving a Missing or Removed Device
Physically Reattaching a Device
Notifying ZFS of Device Availability
Replacing or Repairing a Damaged Device
Determining the Type of Device Failure
Clearing Transient Device Errors
Replacing a Device in a ZFS Storage Pool
Determining If a Device Can Be Replaced
Devices That Cannot be Replaced
Replacing a Device in a ZFS Storage Pool
Resolving ZFS File System Problems
Resolving Data Problems in a ZFS Storage Pool
Checking ZFS File System Integrity
Controlling ZFS Data Scrubbing
ZFS Data Scrubbing and Resilvering
ZFS File System Space Reporting
ZFS Storage Pool Space Reporting
Identifying the Type of Data Corruption
Repairing a Corrupted File or Directory
Repairing Corrupted Data With Multiple Block References
Repairing ZFS Storage Pool-Wide Damage
Repairing a Damaged ZFS Configuration
Repairing an Unbootable System
11. Recommended Oracle Solaris ZFS Practices
Review the following sections to determine whether pool problems or file system unavailability is related to a hardware problem, such as faulty system board, memory, device, HBA, or a misconfiguration.
For example, a failing or faulty disk on a busy ZFS pool can greatly degrade overall system performance.
If you start by diagnosing and identifying hardware problems first, which can be easier to detect and all your hardware checks out, you can then move on to diagnosing pool and file system problems as described in the rest of this chapter. If your hardware, pool, and file system configurations are healthy, consider diagnosing application problems, which are generally more complex to unravel and are not covered in this guide.
The Solaris Fault Manager tracks software, hardware and specific device problems by identifying error telemetry information that indicate a specific symptom in an error log and then reporting actual fault diagnosis when the error symptom results in an actual fault.
The following command identifies any software or hardware related fault.
# fmadm faulty
Use the above command routinely to identify failed services or devices.
Use the following command routinely to identify hardware or device related errors.
# fmdump -eV | more
Error messages in this log file that describe vdev.open_failed, checksum, or io_failure issues need your attention or they might evolve into actual faults that are displayed with the fmadm faulty command.
If the above indicates that a device is failing, then this is a good time to make sure you have a replacement device available.
You can also track additional device errors by using iostat command. Use the following syntax to identify a summary of error statistics.
# iostat -en ---- errors --- s/w h/w trn tot device 0 0 0 0 c0t5000C500335F95E3d0 0 0 0 0 c0t5000C500335FC3E7d0 0 0 0 0 c0t5000C500335BA8C3d0 0 12 0 12 c2t0d0 0 0 0 0 c0t5000C500335E106Bd0 0 0 0 0 c0t50015179594B6F11d0 0 0 0 0 c0t5000C500335DC60Fd0 0 0 0 0 c0t5000C500335F907Fd0 0 0 0 0 c0t5000C500335BD117d0
In the above output, errors are reported on an internal disk c2t0d0. Use the following syntax to display more detailed device errors.
# iostat -En c0t5000C500335F95E3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST930003SSUN300G Revision: 0B70 Serial No: 110672QFSB Size: 300.00GB <300000000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500335FC3E7d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST930003SSUN300G Revision: 0B70 Serial No: 110672TE67 Size: 300.00GB <300000000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500335BA8C3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST930003SSUN300G Revision: 0B70 Serial No: 110672SDF4 Size: 300.00GB <300000000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t0d0 Soft Errors: 0 Hard Errors: 12 Transport Errors: 0 Vendor: AMI Product: Virtual CDROM Revision: 1.00 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 12 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0
In addition to persistently tracking errors within the pool, ZFS also displays syslog messages when events of interest occur. The following scenarios generate notification events:
Device state transition – If a device becomes FAULTED, ZFS logs a message indicating that the fault tolerance of the pool might be compromised. A similar message is sent if the device is later brought online, restoring the pool to health.
Data corruption – If any data corruption is detected, ZFS logs a message describing when and where the corruption was detected. This message is only logged the first time it is detected. Subsequent accesses do not generate a message.
Pool failures and device failures – If a pool failure or a device failure occurs, the fault manager daemon reports these errors through syslog messages as well as the fmdump command.
If ZFS detects a device error and automatically recovers from it, no notification occurs. Such errors do not constitute a failure in the pool redundancy or in data integrity. Moreover, such errors are typically the result of a driver problem accompanied by its own set of error messages.