Execution Performance Impact

Enabling statistics will incur some CPU cost for data collection and aggregation. In many situations, this overhead will not make a noticeable difference on system performance. However, for Oracle ZFS Storage Appliance systems under maximum load, including benchmark loads, the small overhead of statistic collection can begin to be noticeable.

Here are some tips for handling execution overheads:

  • For dynamic statistics, only archive those that are important to record 24x7.

  • Statistics can be suspended, eliminating data collection and the collection overhead. This can be useful if gathering a short interval of a statistic is sufficient for your needs (such as troubleshooting performance). Enable the statistic, wait some minutes, then click the power icon image showing the power icon in the Datasets view to suspend it. Suspended datasets keep their data for later viewing.

  • Monitor overall performance via the static statistics when enabling and disabling dynamic statistics.

  • Be aware that drilldowns will incur overhead for all events. For example, you might trace NFSv3 operations per second for client deimos when there is currently no NFSv3 activity from client deimos. This does not mean that there is no execution overhead for this statistic. Oracle ZFS Storage Appliance must still trace every NFSv3 event, then compare the host with deimos to determine if the data should be recorded in this dataset; however, most of the execution cost has been paid at this point.

Static Statistics

Some statistics are sourced from operating system counters are always maintained, which may be called static statistics. Gathering these statistics has negligible effect on the performance of the Oracle ZFS Storage Appliance system because, to an extent, the system is already maintaining them (they are usually gathered by an operating system feature called kstat). Examples of these statistics are:

Table 5-2 Static Statistics

Category Statistic

CPU

Percent utilization

CPU

Percent utilization broken down by CPU mode

Cache

ARC accesses per second broken down by hit/miss

Cache

ARC size

Disk

I/O bytes per second

Disk

I/O bytes per second broken down by type of operation

Disk

I/O operations per second

Disk

I/O operations per second broken down by disk

Disk

I/O operations per second broken down by type of operation

Network

Device bytes per second

Network

Device bytes per second broken down by device

Network

Device bytes per second broken down by direction

Protocol

NFSv3/NFSv4/NFSv4.1 operations per second

Protocol

NFSv3/NFSv4/NFSv4.1 operations per second broken down by type of operation

When seen in the BUI, those from the previous list without broken down by text will have as a raw statistic.

Because these statistics have negligible execution cost and provide a broad view of system behavior, many are archived by default. See Default Statistics.

Dynamic Statistics

These statistics are created dynamically, and are not usually maintained by the Oracle ZFS Storage Appliance system (they are gathered by an operating system feature called DTrace). Each event is traced, and each second this trace data is aggregated into the statistic. Thus, the cost of this statistic is proportional to the number of events.

Tracing disk details when the activity is 1000 operations/second is unlikely to have a noticeable affect on performance; however, measuring network details when pushing 100,000 packets/second is likely to have a negative effect. The type of information gathered is also a factor: Tracing filenames and client names will increase the performance impact.

Examples of dynamic statistics include:

Table 5-3 Dynamic Statistics

Category Statistic

Protocol

SMB operations per second

Protocol

SMB operations per second broken down by type of operation

Protocol

HTTP/WebDAV requests per second

Protocol

... operations per second broken down by client

Protocol

... operations per second broken down by file name

Protocol

... operations per second broken down by share

Protocol

... operations per second broken down by project

Protocol

... operations per second broken down by latency

Protocol

... operations per second broken down by size

Protocol

... operations per second broken down by offset

"..." denotes any of the protocols.

The best way to determine the impact of these statistics is to enable and disable them while running under a steady load. Benchmark software may be used to apply that steady load. See Working with Analytics for the steps to calculate performance impact in this way.