3.2.9 Summarizing Data with ore.summary

The ore.summary function calculates descriptive statistics and supports extensive analysis of columns in an ore.frame, along with flexible row aggregations.

The ore.summary function supports these statistics:

  • Mean, minimum, maximum, mode, number of missing values, sum, weighted sum

  • Corrected and uncorrected sum of squares, range of values, stddev, stderr, variance

  • t-test for testing the hypothesis that the population mean is 0

  • Kurtosis, skew, Coefficient of Variation

  • Quantiles: p1, p5, p10, p25, p50, p75, p90, p95, p99, qrange

  • 1-sided and 2-sided Confidence Limits for the mean: clm, rclm, lclm

  • Extreme value tagging

The ore.summary function provides a relatively simple syntax compared with SQL queries that produce the same results.

The ore.summary function returns an ore.frame in all cases except when the group.by argument is used. If the group.by argument is used, then ore.summary returns a list of ore.frame objects, one ore.frame per stratum.

For details about the function arguments, invoke help(ore.summary).

Example 3-54 Calculating Default Statistics

This example calculates the mean, minimum, and maximum values for columns AGE and CLASS and rolls up (aggregates) the GENDER column.

ore.summary(NARROW, class = 'GENDER', var = c('AGE', 'CLASS', order = 'freq')

Example 3-55 Calculating Skew and Probability for t Test

This example calculates the skew of AGE as column A and the probability of the Student's t distribution for CLASS as column B.

ore.summary(NARROW, class = 'GENDER', var = 'AGE, CLASS', stats = 'skew(AGE) = A, probt(CLASS) = B')

Example 3-56 Calculating the Weighted Sum

This example calculates the weighted sum for AGE aggregated by GENDER with YRS_RESIDENCE as weights; in other words, it calculates sum(var*weight).

ore.summary(NARROW, class = 'GENDER', var = 'AGE', stats = 'sum = X', weight = 'YRS_RESIDENCE')

Example 3-57 Grouping by Two Columns

This example groups CLASS by GENDER and MARITAL_STATUS.

ore.summary(NARROW, class = c('GENDER', 'MARITAL_STATUS'), var = 'CLASS', ways = 1)

Example 3-58 Grouping by All Possible Ways

This example groups CLASS in all possible ways by GENDER and MARITAL_STATUS.

ore.summary(NARROW, class = c('GENDER', 'MARITAL_STATUS'), var = 'CLASS', ways = 'nway')

Example 3-59 Getting the Maximum Values of Columns Using ore.summary

This example lists the maximum value and corresponding species of the Sepal.Length and Sepal.Width columns in the IRIS ore.frame.

IRIS <- ore.push(iris)
ore.summary(IRIS, c("Sepal.Length", "Sepal.Width"), 
                    "max", 
                    maxid=c(Sepal.Length="Species", Sepal.Width="Species"))

Listing for Example 3-59

R> IRIS <- ore.push(iris)
R> ore.summary(IRIS, c("Sepal.Length", "Sepal.Width"),
+                      "max",
+                      maxid=c(Sepal.Length="Species", Sepal.Width="Species"))
  FREQ MAX(Sepal.Length) MAX(Sepal.Width) MAXID(Sepal.Length->Species) MAXID(Sepal.Width->Species)
1  150               7.9              4.4                    virginica                      setosa
Warning message:
ORE object has no unique key - using random order