Colt Aggregate Functions

11.1 Introduction to Oracle CQL Built-In Aggregate Colt Functions

Table 11-1 lists the built-in aggregate Colt functions that Oracle CQL provides.

Table 11-1 Oracle CQL Built-in Aggregate Colt-Based Functions

Colt Package Function

cern.jet.stat.Descriptive

A set of basic descriptive statistics functions.

Note:

Built-in function names are case sensitive and you must use them in the case shown (in lower case).

Note:

In stream input examples, lines beginning with h (such as h 3800) are heartbeat input tuples. These inform Oracle Stream Analytics that no further input will have a timestamp lesser than the heartbeat value.

In relation output examples, the first tuple output is:

-9223372036854775808:+

This value is -Long.MIN_VALUE() and represents the largest negative timestamp possible.

For more information, see:

11.1.1 Oracle CQL Colt Aggregate Function Signatures and Tuple Arguments

Note that the signatures of the Oracle CQL Colt aggregate functions do not match the signatures of the corresponding Colt aggregate functions.

Consider the following Colt aggregate function:

double autoCorrelation(DoubleArrayList data, int lag, double mean, double variance)

In this signature, data is the Collection over which aggregates will be calculated and mean and variance are the other two parameter aggregates which are required to calculate autoCorrelation (where mean and variance aggregates are calculated on data).

In Oracle Event Processing, data will never come in the form of a Collection. The Oracle CQL function receives input data in a stream of tuples.

So suppose our stream is defined as S:(double val, integer lag). On each input tuple, the Oracle CQL autoCorrelation function will compute two intermediate aggregates, mean and variance, and one final aggregate, autoCorrelation.

Since the function expects a stream of tuples having a double data value and an integer lag value only, the signature of the Oracle CQL autoCorrelation function is:

double autoCorrelation (double data, int lag)

11.1.2 Colt Aggregate Functions and the Where, Group By, and Having Clauses

In Oracle CQL, the where clause is applied before the group by and having clauses. This means the Oracle CQL statement is invalid:

<query id="q1"><![CDATA[ 
    select * from InputChannel[rows 4 slide 4] as ic where geometricMean(c3) > 4
]]></query>

Instead, you must use the Oracle CQL statement shown in the following example:

<query id="q1"><![CDATA[ 
    select * from InputChannel[rows 4 slide 4] as ic, myGeoMean = geometricMean(c3) where myGeoMean > 4
]]></query>

For more information, see:

11.2 autoCorrelation

Syntax

Purpose

autoCorrelation is based on cern.jet.stat.Descriptive.autoCorrelation(DoubleArrayList data, int lag, double mean, double variance). It returns the auto-correlation of a data sequence of the input arguments as a double.

Note:

This function has semantics different from lag1.

This function takes the following tuple arguments:

double1: data value.
int1: lag.

For more information, see

Examples

Consider the query qColtAggr1. Given the data stream SColtAggrFunc with schema (c3 double), the query returns the relation.

<query id="qColtAggr1"><![CDATA[ 
     select autoCorrelation(c3, 0) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        5.441341838866902
1000        6.1593756700951054
1200        3.7269733222923676
1400        4.625160266213489
1600        3.490061774090248
1800        3.6354484064421917
2000        5.635401664977703
2200        5.006087562207967
2400        3.632574304861612
2600        7.618087248962962
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           NaN
1000:       -           NaN
1000:       +           1.0
1200:       -           1.0
1200:       +           1.0
1400:       -           1.0
1400:       +           1.0
1600:       -           1.0
1600:       +           1.000000000000002
1800:       -           1.000000000000002
1800:       +           1.0
2000:       -           1.0
2000:       +           0.9999999999999989
2200:       -           0.9999999999999989
2200:       +           0.999999999999999
2400:       -           0.999999999999999
2400:       +           0.9999999999999991
2600:       -           0.9999999999999991
2600:       +           1.0000000000000013

11.3 correlation

Syntax

Purpose

correlation is based on cern.jet.stat.Descriptive.correlation(DoubleArrayList data1, double standardDev1, DoubleArrayList data2, double standardDev2) . It returns the correlation of two data sequences of the input arguments as a double.

This function takes the following tuple arguments:

double1: data value 1.
double2: data value 2.

For more information, see

Examples

Consider the query qColtAggr2. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr2"><![CDATA[ 
     select correlation(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           NaN
1000:       -           NaN
1000:       +           2.0
1200:       -           2.0
1200:       +           1.5
2000:       -           1.5
2000:       +           1.333333333333333

11.4 covariance

Syntax

Purpose

covariance is based on cern.jet.stat.Descriptive.covariance(DoubleArrayList data1, DoubleArrayList data2). It returns the correlation of two data sequences (see Figure 11-1) of the input arguments as a double.

Figure 11-1 cern.jet.stat.Descriptive.covariance

This function takes the following tuple arguments:

double1: data value 1.
double2: data value 2.

For more information, see:

Examples

Consider the query qColtAggr3. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr3"><![CDATA[ 
     select covariance(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           NaN
1000:       -           NaN
1000:       +           50.0
1200:       -           50.0
1200:       +           100.0
2000:       -           100.0
2000:       +           166.66666666666666

11.5 geometricMean

Syntax

Purpose

geometricMean is based on cern.jet.stat.Descriptive.geometricMean(DoubleArrayList data). It returns the geometric mean of a data sequence (see Figure 11-2) of the input argument as a double.

Figure 11-2 cern.jet.stat.Descriptive.geometricMean(DoubleArrayList data)

This function takes the following tuple arguments:

double1: data value.

Note that for a geometric mean to be meaningful, the minimum of the data values must not be less than or equal to zero.

For more information, see:

Examples

Consider the query qColtAggr6. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr6"><![CDATA[ 
    select geometricMean(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           40.0
1000:       -           40.0
1000:       +           34.64101615137755
1200:       -           34.64101615137755
1200:       +           28.844991406148168
2000:       -           28.844991406148168
2000:       +           22.133638394006436

11.6 geometricMean1

Syntax

Purpose

geometricMean1 is based on cern.jet.stat.Descriptive.geometricMean(double sumOfLogarithms). It returns the geometric mean of a data sequence (see Figure 11-3) of the input arguments as a double.

Figure 11-3 cern.jet.stat.Descriptive.geometricMean1(int size, double sumOfLogarithms)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr7. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr7"><![CDATA[ 
    select geometricMean1(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           Infinity
1000:       -           Infinity
1000:       +           Infinity
1200:       -           Infinity
1200:       +           Infinity
2000:       -           Infinity
2000:       +           Infinity

11.7 harmonicMean

Syntax

Purpose

harmonicMean is based on cern.jet.stat.Descriptive.harmonicMean(int size, double sumOfInversions). It returns the harmonic mean of a data sequence as a double.

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr8. Given the data stream SColtAggrFunc with schema (c3 double), the query returns the relation.

<query id="qColtAggr8"><![CDATA[ 
    select harmonicMean(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10         5.441341838866902
1000        6.1593756700951054
1200        3.7269733222923676
1400        4.625160266213489
1600        3.490061774090248
1800        3.6354484064421917
2000        5.635401664977703
2200        5.006087562207967
2400        3.632574304861612
2600        7.618087248962962
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           5.441341876983643
1000:       -           5.441341876983643
1000:       +           5.778137193205395
1200:       -           5.778137193205395
1200:       +           4.882442561720335
1400:       -           4.882442561720335
1400:       +           4.815475325819701
1600:       -           4.815475325819701
1600:       +           4.475541862878903
1800:       -           4.475541862878903
1800:       +           4.309563447664887
2000:       -           4.309563447664887
2000:       +           4.45944509362759
2200:       -           4.45944509362759
2200:       +           4.5211563834502515
2400:       -           4.5211563834502515
2400:       +           4.401525382790638
2600:       -           4.401525382790638
2600:       +           4.595562422157167

11.8 kurtosis

Syntax

Purpose

kurtosis is based on cern.jet.stat.Descriptive.kurtosis(DoubleArrayList data, double mean, double standardDeviation). It returns the kurtosis or excess (see Figure 11-4) of a data sequence as a double.

Figure 11-4 cern.jet.stat.Descriptive.kurtosis(DoubleArrayList data, double mean, double standardDeviation)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr12. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr12"><![CDATA[ 
    select kurtosis(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           NaN
1000:       -           NaN
1000:       +           -2.0
1200:       -           -2.0
1200:       +           -1.5000000000000002
2000:       -           -1.5000000000000002
2000:       +           -1.3600000000000003

11.9 lag1

Syntax

Purpose

lag1 is based on cern.jet.stat.Descriptive.lag1(DoubleArrayList data, double mean). It returns the lag - 1 auto-correlation of a dataset as a double.

Note:

This function has semantics different from autoCorrelation.

This function takes the following tuple arguments:

double1: data value.

For more information, see

Examples

Consider the query qColtAggr14. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr14"><![CDATA[ 
    select lag1(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           NaN
1000:       -           NaN
1000:       +           -0.5
1200:       -           -0.5
1200:       +           0.0
2000:       -           0.0
2000:       +           0.25

11.10 mean

Syntax

Purpose

mean is based on cern.jet.stat.Descriptive.mean(DoubleArrayList data). It returns the arithmetic mean of a data sequence (see Figure 11-5) as a double.

Figure 11-5 cern.jet.stat.Descriptive.mean(DoubleArrayList data)

The following table lists the input types and the corresponding output types:

Input Types	Output Types
INT	DOUBLE
BIGINT	DOUBLE
FLOAT	DOUBLE
DOUBLE	DOUBLE

For more information, see:

Examples

Consider the query qColtAggr16. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr16"><![CDATA[ 
    select mean(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           40.0
1000:       -           40.0
1000:       +           35.0
1200:       -           35.0
1200:       +           30.0
2000:       -           30.0
2000:       +           25.0

11.11 meanDeviation

Syntax

Purpose

meanDeviation is based on cern.jet.stat.Descriptive.meanDeviation(DoubleArrayList data, double mean). It returns the mean deviation of a dataset (see Figure 11-6) as a double.

Figure 11-6 cern.jet.stat.Descriptive.meanDeviation(DoubleArrayList data, double mean)

This function takes the following tuple arguments:

double1: data value.

For more information, see

Examples

Consider the query qColtAggr17. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr17"><![CDATA[ 
    select meanDeviation(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           0.0
1000:       -           0.0
1000:       +           5.0
1200:       -           5.0
1200:       +           6.666666666666667
2000:       -           6.666666666666667
2000:       +           10.0

11.12 median

Syntax

Purpose

median is based on cern.jet.stat.Descriptive.median(DoubleArrayList sortedData). It returns the median of a sorted data sequence as a double.

The following table lists the input types and the corresponding output types:

Table 11-2 Input and Output Types

Input Types	Output Types
INT	DOUBLE
BIGINT	DOUBLE
FLOAT	DOUBLE
DOUBLE	DOUBLE

Note:

If the input type is INT, then return type will also be INT and it will be floor of the divided value.

For more information, see:

Examples

Consider the query qColtAggr18. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr18"><![CDATA[ 
    select median(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           40.0
1000:       -           40.0
1000:       +           35.0
1200:       -           35.0
1200:       +           30.0
2000:       -           30.0
2000:       +           25.0

11.13 moment

Syntax

Purpose

moment is based on cern.jet.stat.Descriptive.moment(DoubleArrayList data, int k, double c). It returns the moment of the k-th order with constant c of a data sequence (see Figure 11-7) as a double.

Figure 11-7 cern.jet.stat.Descriptive.moment(DoubleArrayList data, int k, double c)

This function takes the following tuple arguments:

double1: data value.
int1: k.
double2: c.

For more information, see:

Examples

Consider the query qColtAggr21. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr21"><![CDATA[ 
    select moment(c3, c1, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           0.0
1000:       -           0.0
1000:       +           5000.0
1200:       -           5000.0
1200:       +           3000.0
2000:       -           3000.0
2000:       +           1.7045E11

11.14 pooledMean

Syntax

Purpose

pooledMean is based on cern.jet.stat.Descriptive.pooledMean(int size1, double mean1, int size2, double mean2). It returns the pooled mean of two data sequences (see Figure 11-8) as a double.

Figure 11-8 cern.jet.stat.Descriptive.pooledMean(int size1, double mean1, int size2, double mean2)

This function takes the following tuple arguments:

double1: mean 1.
double2: mean 2.

For more information, see

Examples

Consider the query qColtAggr22. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr22"><![CDATA[ 
    select pooledMean(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           40.0
1000:       -           40.0
1000:       +           35.0
1200:       -           35.0
1200:       +           30.0
2000:       -           30.0
2000:       +           25.0

11.15 pooledVariance

Syntax

Purpose

pooledVariance is based on cern.jet.stat.Descriptive.pooledVariance(int size1, double variance1, int size2, double variance2). It returns the pooled variance of two data sequences (see Figure 11-9) as a double.

Figure 11-9 cern.jet.stat.Descriptive.pooledVariance(int size1, double variance1, int size2, double variance2)

This function takes the following tuple arguments:

double1: variance 1.
double2: variance 2.

For more information, see

Examples

Consider the query qColtAggr23. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr23"><![CDATA[ 
    select pooledVariance(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           0.0
1000:       -           0.0
1000:       +           25.0
1200:       -           25.0
1200:       +           66.66666666666667
2000:       -           66.66666666666667
2000:       +           125.0

11.16 product

Syntax

Purpose

product is based on cern.jet.stat.Descriptive.product(DoubleArrayList data). It returns the product of a data sequence (see Figure 11-10) as a double.

Figure 11-10 cern.jet.stat.Descriptive.product(DoubleArrayList data)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr24. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr24"><![CDATA[ 
    select product(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           40.0
1000:       -           40.0
1000:       +           1200.0
1200:       -           1200.0
1200:       +           24000.0
2000:       -           24000.0
2000:       +           240000.0

11.17 quantile

Syntax

Purpose

quantile is based on cern.jet.stat.Descriptive.quantile(DoubleArrayList sortedData, double phi). It returns the phi-quantile as a double; that is, an element elem for which holds that phi percent of data elements are less than elem.

This function takes the following tuple arguments:

double1: data value.
double2: phi; the percentage; must satisfy 0 <= phi <= 1.

For more information, see:

Examples

Consider the query qColtAggr26. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr26"><![CDATA[ 
    select quantile(c3, c2) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:         -
  10:         +         40.0
1000:         -         40.0
1000:         +         36.99999988079071
1200:         -         36.99999988079071
1200:         +         37.799999713897705
2000:         -         37.799999713897705
2000:         +         22.000000178813934

11.18 quantileInverse

Syntax

Purpose

quantileInverse is based on cern.jet.stat.Descriptive.quantileInverse(DoubleArrayList sortedList, double element). It returns the percentage phi of elements <= element (0.0 <= phi <= 1.0) as a double. This function does linear interpolation if the element is not contained but lies in between two contained elements.

This function takes the following tuple arguments:

double1: data.
double2: element.

For more information, see:

Examples

Consider the query qColtAggr27. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr27"><![CDATA[ 
    select quantileInverse(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           1.0
1000:       -           1.0
1000:       +           0.5
1200:       -           0.5
1200:       +           0.3333333333333333
2000:       -           0.3333333333333333
2000:       +           0.25

11.19 rankInterpolated

Syntax

Purpose

rankInterpolated is based on cern.jet.stat.Descriptive.rankInterpolated(DoubleArrayList sortedList, double element). It returns the linearly interpolated number of elements in a list less or equal to a given element as a double.

The rank is the number of elements <= element. Ranks are of the form{0, 1, 2,..., sortedList.size()}. If no element is <= element, then the rank is zero. If the element lies in between two contained elements, then linear interpolation is used and a non-integer value is returned.

This function takes the following tuple arguments:

double1: data value.
double2: element.

For more information, see:

Examples

Consider the query qColtAggr29. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr29"><![CDATA[ 
    select rankInterpolated(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       -
  10:       +           1.0
1000:       -           1.0
1000:       +           1.0
1200:       -           1.0
1200:       +           1.0
2000:       -           1.0
2000:       +           1.0

11.20 rms

Syntax

Purpose

rms is based on cern.jet.stat.Descriptive.rms(int size, double sumOfSquares). It returns the Root-Mean-Square (RMS) of a data sequence (see Figure 11-11) as a double.

Figure 11-11 cern.jet.stat.Descriptive.rms(int size, double sumOfSquares)

This function takes the following tuple arguments:

double1: data value.

For more information, see

Examples

Consider the query qColtAggr30. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr30"><![CDATA[ 
    select rms(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           40.0
1000:       -           40.0
1000:       +           35.35533905932738
1200:       -           35.35533905932738
1200:       +           31.09126351029605
2000:       -           31.09126351029605
2000:       +           27.386127875258307

11.21 sampleKurtosis

Syntax

Purpose

sampleKurtosis is based on cern.jet.stat.Descriptive.sampleKurtosis(DoubleArrayList data, double mean, double sampleVariance). It returns the sample kurtosis (excess) of a data sequence as a double.

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr31. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr31"><![CDATA[ 
     select sampleKurtosis(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           NaN
1000:       -           NaN
1000:       +           NaN
1200:       -           NaN
1200:       +           NaN
2000:       -           NaN
2000:       +           -1.1999999999999993

11.22 sampleKurtosisStandardError

Syntax

Purpose

sampleKurtosisStandardError is based on cern.jet.stat.Descriptive.sampleKurtosisStandardError(int size). It returns the standard error of the sample Kurtosis as a double.

This function takes the following tuple arguments:

int1: data value.

For more information, see:

Examples

Consider the query qColtAggr33. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr33"><![CDATA[ 
     select sampleKurtosisStandardError(c1) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           Infinity
1200:       -           Infinity
1200:       +           Infinity
2000:       -           Infinity
2000:       +           2.6186146828319083

11.23 sampleSkew

Syntax

Purpose

sampleSkew is based on cern.jet.stat.Descriptive.sampleSkew(DoubleArrayList data, double mean, double sampleVariance). It returns the sample skew of a data sequence as a double.

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr34. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr34"><![CDATA[ 
    select sampleSkew(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           NaN
1000:       -           NaN
1000:       +           NaN
1200:       -           NaN
1200:       +           0.0
2000:       -           0.0
2000:       +           0.0

11.24 sampleSkewStandardError

Syntax

Purpose

sampleSkewStandardError is based on cern.jet.stat.Descriptive.sampleSkewStandardError(int size). It returns the standard error of the sample skew as a double.

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr36. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr36"><![CDATA[ 
    select sampleSkewStandardError(c1) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           -0.0
1000:       -           -0.0
1000:       +           Infinity
1200:       -           Infinity
1200:       +           1.224744871391589
2000:       -           1.224744871391589
2000:       +           1.01418510567422

11.25 sampleVariance

Syntax

Purpose

sampleVariance is based on cern.jet.stat.Descriptive.sampleVariance(DoubleArrayList data, double mean). It returns the sample variance of a data sequence (see Figure 11-12) as a double.

Figure 11-12 cern.jet.stat.Descriptive.sampleVariance(DoubleArrayList data, double mean)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr38. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr38"><![CDATA[ 
    select sampleVariance(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           NaN
1000:       -           NaN
1000:       +           50.0
1200:       -           50.0
1200:       +           100.0
2000:       -           100.0
2000:       +           166.66666666666666

11.26 skew

Syntax

Purpose

skew is based on cern.jet.stat.Descriptive.skew(DoubleArrayList data, double mean, double standardDeviation). It returns the skew of a data sequence of a data sequence (see Figure 11-13) as a double.

Figure 11-13 cern.jet.stat.Descriptive.skew(DoubleArrayList data, double mean, double standardDeviation)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr41. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr41"><![CDATA[ 
    select skew(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           NaN
1000:       -           NaN
1000:       +           0.0
1200:       -           0.0
1200:       +           0.0
2000:       -           0.0
2000:       +           0.0

11.27 standardDeviation

Syntax

Purpose

standardDeviation is based on cern.jet.stat.Descriptive.standardDeviation(double variance). It returns the standard deviation from a variance as a double.

The following table lists the input types and the corresponding output types:

Input Types	Output Types
INT	DOUBLE
BIGINT	DOUBLE
FLOAT	DOUBLE
DOUBLE	DOUBLE

For more information, see

Examples

Consider the query qColtAggr44. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr44"><![CDATA[ 
    select standardDeviation(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           5.0
1200:       -           5.0
1200:       +           8.16496580927726
2000:       -           8.16496580927726
2000:       +           11.180339887498949

11.28 standardError

Syntax

Purpose

standardError is based on cern.jet.stat.Descriptive.standardError(int size, double variance). It returns the standard error of a data sequence (see Figure 11-14) as a double.

Figure 11-14 cern.jet.stat.Descriptive.cern.jet.stat.Descriptive.standardError(int size, double variance)

This function takes the following tuple arguments:

double1: data value.

For more information, see

Examples

Consider the query qColtAggr45. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr45"><![CDATA[ 
     select standardError(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           3.5355339059327378
1200:       -           3.5355339059327378
1200:       +           4.714045207910317
2000:       -           4.714045207910317
2000:       +           5.5901699437494745

11.29 sumOfInversions

Syntax

Purpose

sumOfInversions is based on cern.jet.stat.Descriptive.sumOfInversions(DoubleArrayList data, int from, int to). It returns the sum of inversions of a data sequence (see Figure 11-15) as a double.

Figure 11-15 cern.jet.stat.Descriptive.sumOfInversions(DoubleArrayList data, int from, int to)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr48. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr48"><![CDATA[ 
     select sumOfInversions(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:        - 
  10:        +           0.025
1000:       -           0.025
1000:       +           0.058333333333333334
1200:       -           0.058333333333333334
1200:       +           0.10833333333333334
2000:       -           0.10833333333333334
2000:       +           0.20833333333333334

11.30 sumOfLogarithms

Syntax

Purpose

sumOfLogarithms is based on cern.jet.stat.Descriptive.sumOfLogarithms(DoubleArrayList data, int from, int to). It returns the sum of logarithms of a data sequence (see Figure 11-16) as a double.

Figure 11-16 cern.jet.stat.Descriptive.sumOfLogarithms(DoubleArrayList data, int from, int to)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr49. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr49"><![CDATA[ 
    select sumOfLogarithms(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:        - 
  10:        +           3.6888794541139363
1000:       -           3.6888794541139363
1000:       +           7.090076835776092
1200:       -           7.090076835776092
1200:       +           10.085809109330082
2000:       -           10.085809109330082
2000:       +           12.388394202324129

11.31 sumOfPowerDeviations

Syntax

Purpose

sumOfPowerDeviations is based on cern.jet.stat.Descriptive.sumOfPowerDeviations(DoubleArrayList data, int k, double c). It returns sum of power deviations of a data sequence (see Figure 11-17) as a double.

Figure 11-17 cern.jet.stat.Descriptive.sumOfPowerDeviations(DoubleArrayList data, int k, double c)

This function is optimized for common parameters like c == 0.0, k == -2 .. 4, or both.

This function takes the following tuple arguments:

double1: data value.
int1: k.
double2: c.

For more information, see:

Examples

Consider the query qColtAggr50. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr50"><![CDATA[ 
    select sumOfPowerDeviations(c3, c1, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           10000.0
1200:       -           10000.0
1200:       +           9000.0
2000:       -           9000.0
2000:       +           6.818E11

11.32 sumOfPowers

Syntax

Purpose

sumOfPowers is based on cern.jet.stat.Descriptive.sumOfPowers(DoubleArrayList data, int k). It returns the sum of powers of a data sequence (see Figure 11-18) as a double.

Figure 11-18 cern.jet.stat.Descriptive.sumOfPowers(DoubleArrayList data, int k)

This function takes the following tuple arguments:

double1: data value.
int1: k.

For more information, see:

Examples

Consider the query qColtAggr52. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr52"><![CDATA[ 
    select sumOfPowers(c3, c1) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           40.0
1000:       -           40.0
1000:       +           3370000.0
1200:       -           3370000.0
1200:       +           99000.0
2000:       -           99000.0
2000:       +           7.2354E12

11.33 sumOfSquaredDeviations

Syntax

Purpose

sumOfSquaredDeviations is based on cern.jet.stat.Descriptive.sumOfSquaredDeviations(int size, double variance). It returns the sum of squared mean deviation of a data sequence (see Figure 11-19) as a double.

Figure 11-19 cern.jet.stat.Descriptive.sumOfSquaredDeviations(int size, double variance)

This function takes the following tuple arguments:

double1: data value.

For more information, see

Examples

Consider the query qColtAggr53. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr53"><![CDATA[ 
    select sumOfSquaredDeviations(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           25.0
1200:       -           25.0
1200:       +           133.33333333333334
2000:       -           133.33333333333334
2000:       +           375.0

11.34 sumOfSquares

Syntax

Purpose

sumOfSquares is based on cern.jet.stat.Descriptive.sumOfSquares(DoubleArrayList data). It returns the sum of squares of a data sequence (see Figure 11-20) as a double.

Figure 11-20 cern.jet.stat.Descriptive.sumOfSquares(DoubleArrayList data)

This function takes the following tuple arguments:

double1: data value.

For more information, see:

Examples

Consider the query qColtAggr54. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr54"><![CDATA[ 
    select sumOfSquares(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           1600.0
1000:       -           1600.0
1000:       +           2500.0
1200:       -           2500.0
1200:       +           2900.0
2000:       -           2900.0
2000:       +           3000.0

11.35 trimmedMean

Syntax

Purpose

trimmedMean is based on cern.jet.stat.Descriptive.trimmedMean(DoubleArrayList sortedData, double mean, int left, int right). It returns the trimmed mean of an ascending sorted data sequence as a double.

This function takes the following tuple arguments:

double1: data value.
int1: left.
int2: right.

For more information, see:

Examples

Consider the query qColtAggr55. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr55"><![CDATA[ 
    select trimmedMean(c3, c1, c1) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        0, 0.5, 40.0, 8
1000        0, 0.7, 30.0, 6
1200        0, 0.89, 20.0, 12
2000        1, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
10:          -        
10:          +           40.0
1000:        -           40.0
1000:        +           35.0 
1200:        -           35.0
1200:        +           30.0
2000:        -           30.0 
2000:        +           25.0

11.36 variance

Syntax

Purpose

variance is based on cern.jet.stat.Descriptive.variance(int size, double sum, double sumOfSquares). It returns the variance of a data sequence (see Figure 11-21) as a double.

Figure 11-21 cern.jet.stat.Descriptive.variance(int size, double sum, double sumOfSquares)

The following table lists the input types and the corresponding output types:

Input Types	Output Types
INT	DOUBLE
BIGINT	DOUBLE
FLOAT	DOUBLE
DOUBLE	DOUBLE

For more information, see:

Examples

Consider the query qColtAggr57. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr57"><![CDATA[ 
    select variance(c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           0.0
1000:       -           0.0
1000:       +           25.0
1200:       -           25.0
1200:       +           66.66666666666667
2000:       -           66.66666666666667
2000:       +           125.0

11.37 weightedMean

Syntax

Purpose

weightedMean is based on cern.jet.stat.Descriptive.weightedMean(DoubleArrayList data, DoubleArrayList weights). It returns the weighted mean of a data sequence (see Figure 11-22) as a double.

Figure 11-22 cern.jet.stat.Descriptive.weightedMean(DoubleArrayList data, DoubleArrayList weights)

This function takes the following tuple arguments:

double1: data value.
double2: weight value.

For more information, see:

Examples

Consider the query qColtAggr58. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr58"><![CDATA[ 
    select weightedMean(c3, c3) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        4, 0.7, 30.0, 6
1200        3, 0.89, 20.0, 12
2000        8, 0.4, 10.0, 4
h 8000
h 200000000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
  10:       - 
  10:       +           40.0
1000:       -           40.0
1000:       +           35.714285714285715
1200:       -           35.714285714285715
1200:       +           32.22222222222222
2000:       -           32.22222222222222
2000:       +           30.0

11.38 winsorizedMean

Syntax

Purpose

winsorizedMean is based on cern.jet.stat.Descriptive.winsorizedMean(DoubleArrayList sortedData, double mean, int left, int right). It returns the winsorized mean of a sorted data sequence as a double.

This function takes the following tuple arguments:

double1: data value.
int1: left.
int2: right.

For more information, see:

Examples

Consider the query qColtAggr60. Given the data stream SColtAggrFunc with schema (c1 integer, c2 float, c3 double, c4 bigint), the query returns the relation.

<query id="qColtAggr60"><![CDATA[ 
    select winsorizedMean(c3, c1, c1) from SColtAggrFunc
]]></query>

Timestamp   Tuple
  10        1, 0.5, 40.0, 8
1000        0, 0.7, 30.0, 6
1200        1, 0.89, 20.0, 12
2000        1, 0.4, 10.0, 4
h 8000

Timestamp   Tuple Kind  Tuple
-9223372036854775808:+
10:              -
10:          +           40.0
1000:        -           40.0 
1000:        +           35.0 
1200:        -           35.0 
1200:        +           30.000000000000004 
2000:        -           30.000000000000004 
2000:        +           25

11 Colt Aggregate Functions

11.1 Introduction to Oracle CQL Built-In Aggregate Colt Functions

11.1.1 Oracle CQL Colt Aggregate Function Signatures and Tuple Arguments

11.1.2 Colt Aggregate Functions and the Where, Group By, and Having Clauses

11.2 autoCorrelation

11.3 correlation

11.4 covariance

11.5 geometricMean

11.6 geometricMean1

11.7 harmonicMean

11.8 kurtosis

11.9 lag1

11.10 mean

11.11 meanDeviation

11.12 median

11.13 moment

11.14 pooledMean

11.15 pooledVariance

11.16 product

11.17 quantile

11.18 quantileInverse

11.19 rankInterpolated

11.20 rms

11.21 sampleKurtosis

11.22 sampleKurtosisStandardError

11.23 sampleSkew

11.24 sampleSkewStandardError

11.25 sampleVariance

11.26 skew

11.27 standardDeviation

11.28 standardError

11.29 sumOfInversions

11.30 sumOfLogarithms

11.31 sumOfPowerDeviations

11.32 sumOfPowers

11.33 sumOfSquaredDeviations

11.34 sumOfSquares

11.35 trimmedMean

11.36 variance

11.37 weightedMean

11.38 winsorizedMean