Go to main content
Oracle® ZFS Storage Appliance Analytics Guide, Release OS8.7.x

Exit Print View

Updated: August 2017
 
 

Disk: Disks

The Disks statistic is used to display the heat map for disks broken down by percent utilization. This is the best way to identify when pool disks are under heavy load. It may also identify problem disks that are beginning to perform poorly, before their behavior triggers a fault and automatic removal from the pool.

When to Check Disks

Any investigation into disk performance.

Disks Breakdowns

Table 30  A Breakdown of Disks
Breakdown
Description
percent utilization
A heat map with utilization on the Y-axis and each level on the Y-axis colored by the number of disks at that utilization: from light (none) to dark (many).

Interpretation

Utilization is a better measure of disk load than IOPS or throughput. Utilization is measured as the time during which that disk was busy performing requests (see Details below). At 100% utilization the disk may not be able to accept more requests, and additional I/O may wait on a queue. This I/O wait time will cause latency to increase and overall performance to decrease.

In practise, disks with a consistent Utilization of 75% or higher are an indication of heavy disk load.

The heat map allows a particular pathology to be easily identified: a single disk misperforming and reaching 100% utilization (a bad disk). Disks can exhibit this symptom before they fail. Once disks fail, they are automatically removed from the pool with a corresponding alert. This particular problem is during the time before they fail, when their I/O latency is increasing and slowing down overall appliance performance, but their status is considered healthy - they have yet to identify any error state. This situation will be seen as a feint line at the top of the heat map, showing that a single disk has stayed at 100% utilization for some time.

Suggested interpretation summary:

Table 31  Interpretation Summary
Observed
Suggested Interpretation
Most disks consistently over 75%
Available disk resources are being exhausted.
Single disk at 100% for several seconds
This can indicate a bad disk that is about to fail.

Further Analysis

To understand the nature of the I/O, such as IOPS, throughput, I/O sizes and offsets, see Disk: I/O Operations and Disk: I/O Bytes.

Details

This statistic is actually a measure of percent busy, which serves as a reasonable approximation of percent utilization since the appliance manages the disks directly. Technically this isn't a direct measure of disk utilization: at 100% busy, a disk may be able to accept more requests which it serves concurrently by inserting into and reordering its command queue, or serves from its on-disk cache.