The term dataset refers to the in memory cached and on disk saved data for a statistic, and is presented as an entity in Analytics with administration controls.
The Analytics->Datasets screen in the BUI lists all datasets. These include open statistics that are being viewed in a worksheet (and as such are temporary datasets that disappear when the worksheet is closed) and statistics that are being archived to disk.
The following icons are visible in the BUI view; some of these are only visible during mouse over of a dataset entry:
See Actions for descriptions for these dataset actions.
The analytics datasets context allows management of datasets.
Use the show command to list datasets:
caji:analytics datasets> show Datasets: DATASET STATE INCORE ONDISK NAME dataset-000 active 674K 35.7K arc.accesses[hit/miss] dataset-001 active 227K 31.1K arc.l2_accesses[hit/miss] dataset-002 active 227K 31.1K arc.l2_size dataset-003 active 227K 31.1K arc.size dataset-004 active 806K 35.7K arc.size[component] dataset-005 active 227K 31.1K cpu.utilization dataset-006 active 451K 35.6K cpu.utilization[mode] dataset-007 active 57.7K 0 dnlc.accesses dataset-008 active 490K 35.6K dnlc.accesses[hit/miss] dataset-009 active 227K 31.1K http.reqs dataset-010 active 227K 31.1K io.bytes dataset-011 active 268K 31.1K io.bytes[op] dataset-012 active 227K 31.1K io.ops ...
Many of the above datasets are archived by default, there is only one that is additional: "dataset-007", which has no ONDISK size, indicating that it is a temporary statistic that isn't archived. The names of the statistics are abbreviated versions of what is visible in the BUI: "dnlc.accesses" is short for "Cache: DNLC accesses per second".
Specific dataset properties can be viewed after selecting it:
caji:analytics datasets> select dataset-007 caji:analytics dataset-007> show Properties: name = dnlc.accesses grouping = Cache explanation = DNLC accesses per second incore = 65.5K size = 0 suspended = false
caji:analytics datasets> select dataset-007 caji:analytics dataset-007> read 10 DATE/TIME /SEC /SEC BREAKDOWN 2009-10-14 21:25:19 137 - - 2009-10-14 21:25:20 215 - - 2009-10-14 21:25:21 156 - - 2009-10-14 21:25:22 171 - - 2009-10-14 21:25:23 2722 - - 2009-10-14 21:25:24 190 - - 2009-10-14 21:25:25 156 - - 2009-10-14 21:25:26 166 - - 2009-10-14 21:25:27 118 - - 2009-10-14 21:25:28 1354 - -
Breakdowns will also be listed if available. The following shows CPU utilization broken down CPU mode (user/kernel), which was available as dataset-006:
caji:analytics datasets> select dataset-006 caji:analytics dataset-006> read 5 DATE/TIME %UTIL %UTIL BREAKDOWN 2009-10-14 21:30:07 7 6 kernel 0 user 2009-10-14 21:30:08 7 7 kernel 0 user 2009-10-14 21:30:09 0 - - 2009-10-14 21:30:10 15 14 kernel 1 user 2009-10-14 21:30:11 25 24 kernel 1 user
The summary is shown in "%UTIL", and contributing elements in "%UTIL BREAKDOWN". At 21:30:10, there 14% kernel time and 1% user time. The 21:30:09 line shows 0% in the "%UTIL" summary, and so does not list breakdowns ("--").
To print comma separated values (CSV) for a number of seconds of data, use the csv command:
knife:analytics datasets> select dataset-022 knife:analytics dataset-022> csv 10 Time (UTC),Operations per second 2011-03-21 18:30:02,0 2011-03-21 18:30:03,0 2011-03-21 18:30:04,0 2011-03-21 18:30:05,0 2011-03-21 18:30:06,0 2011-03-21 18:30:07,0 2011-03-21 18:30:08,0 2011-03-21 18:30:09,0 2011-03-21 18:30:10,0 2011-03-21 18:30:11,0
The CLI has a feature that is not yet available in the BUI: the ability to suspend and resume all datasets. This may be useful when benchmarking the appliance to determine its absolute maximum performance. Since some statistics can consume significant CPU and disk resources to archive, benchmarks performed with these statistics enabled are invalid.
To suspend all datasets use suspend:
caji:analytics datasets> suspend This will suspend all datasets. Are you sure? (Y/N) y caji:analytics datasets> show Datasets: DATASET STATE INCORE ONDISK NAME dataset-000 suspend 638K 584K arc.accesses[hit/miss] dataset-001 suspend 211K 172K arc.l2_accesses[hit/miss] dataset-002 suspend 211K 133K arc.l2_size dataset-003 suspend 211K 133K arc.size ...
To resume all datasets use resume:
caji:analytics datasets> resume caji:analytics datasets> show Datasets: DATASET STATE INCORE ONDISK NAME dataset-000 active 642K 588K arc.accesses[hit/miss] dataset-001 active 215K 174K arc.l2_accesses[hit/miss] dataset-002 active 215K 134K arc.l2_size dataset-003 active 215K 134K arc.size ...
To discard the minute level of data granularity from a dataset use the prune command:
caji:analytics dataset-001> prune minute This will remove per-second and minute data collected prior to 2012-4-02 16:56:52.
Are you sure? (Y/N)
Note: This command also deletes the lower level of data granularity. For example, using the prune hour command also deletes the per-second and per-minute data.