Using Aggregate Functions

You can use built in aggregate functions to find information such as a count, a sum, an average, a minimum, or a maximum.

The following functions are called SQL aggregate functions, because their semantics are similar to those in standard SQL: they work in conjunction with grouping and they aggregate values across the rows of a group. The aggregate functions can be used only in the SELECT or ORDER BY clauses, and they cannot be nested.

If you want to follow along with the examples, create tables and insert the data as described in the Tables used in the Examples topic.

The following aggregate functions are supported:

  • long count(*)
  • long count(any*)
  • number sum(any*)
  • number avg(any*)
  • any_atomic min(any*)
  • any_atomic max(any*)

Note:

All SQL aggregate function names are case sensitive.

count(*) function

The count star function returns the number of rows in a group.

Syntax:
long count(*)
Semantics:

The count star function calculates the number of records fetched by the query.

return type: long

Example 6-13 Find the total number of passengers who have contact details in their records

SELECT count(*) AS COUNT_PASSENGER 
FROM BaggageInfo bag
WHERE length(contactPhone) > 0

Explanation: In an airline baggage tracking application, you can calculate the total count of passengers who have furnished their contact details. The contactPhone field in the BaggageInfo table includes the contact details of the passengers. You use the count star function to find the number of passenger records with the contactPhone entry. There is a possibility that the contact details include NULL values and empty strings. You can use the length function in the WHERE clause to exclude such rows from being counted. If a contactPhone has an empty or a NULL value, the length function returns a NULL value. You apply a value comparison to select only the rows that yield a value greater than 0, and then determine the total count of the resultant rows using the count star function.

Output:
{"COUNT_PASSENGER":21}

Example 6-14 Find the number of checked bags that arrived on the 1st of Feb 2019

SELECT count(*) AS COUNT_BAGS
FROM BaggageInfo bag,
EXTRACT(DAY FROM CAST (bag.bagInfo[0].bagArrivalDate AS Timestamp(0))) $t1,
EXTRACT(MONTH FROM CAST (bag.bagInfo[0].bagArrivalDate AS Timestamp(0))) $t2,
EXTRACT(YEAR FROM CAST (bag.bagInfo[0].bagArrivalDate AS Timestamp(0))) $t3
WHERE $t3=2019 AND $t2=02 AND $t1=01

Explanation: In the airline baggage tracking application, you can get the total count of checked bags on a particular date. The bagArrivalDate field in the BaggageInfo table contains the arrival date of the passenger's checked bags. In the above query, you count the number of rows that have the bagArrivalDate as 1st of Feb 2019 to fetch the number of checked bags on the given date. You use several filter conditions here to extract only the date part of the timestamp. You first use the CAST operator to convert the bagArrivalDate to a timestamp and then extract the date, month, and year details from the timestamp using the EXTRACT expression. You use the value comparison to determine if the day, month, and year correspond to the required date value, that is, 01,02, and 2019 respectively. You then use the logical operator AND to select only the rows that match all three conditions and count the resultant rows using the count function.

Output:
{"COUNT_BAGS":1}

count function

The count function computes its input expression on each row in a group and counts all the non-NULL values returned by these evaluations of the input expression.

Syntax:
long count(any*)
Semantics:

any: The count function accepts any parameter type as the input argument.

return type: long

The count star function can be used when you want to count the rows of a resultant query. It does not accept any other input argument. Whereas, the count function can be used when you want to enumerate the outcome of an input expression.

Example 6-15 Find the total number of checked bags that are estimated to arrive at the LAX airport at a particular time

SELECT $estdate as ARRIVALDATE, 
count($flight) AS COUNT
FROM BaggageInfo $bag, 
$bag.bagInfo.flightLegs.estimatedArrival $estdate,
$bag.bagInfo.flightLegs.flightNo $flight,  
$bag.bagInfo.flightLegs.fltRouteDest $flt_dest 
WHERE $estdate =any "2019-02-01T11:00:00Z" AND $flt_dest =any "LAX"
GROUP BY $estdate

Explanation: In an airline baggage tracking application, you can get the total count of checked bags that are estimated to arrive at a particular airport and time. For each flight leg, the estimatedArrival field in the flightLegs array of the BaggageInfo table contains the arrival time of the checked bags and the fltRouteDest field contains the destination airport code. In the above query, to determine the total number of checked bags arriving at the LAX airport at a given time, you first group the data with the estimated arrival time value using the GROUP BY clause. From the group, you select only the rows that have the destination airport as LAX. You then determine the bag count for the resultant rows using the count function.

Here, you can compare the string-formatted dates in ISO-8601 format due to the natural sorting order of strings without having to cast them into timestamp data types. The $bag.bagInfo.flightLegs.estimatedArrival and $bag.bagInfo.flightLegs.fltRouteDest are sequences. Since the comparison operator '=' cannot operate on sequences of more than one item, the sequence comparison operator '=any' is used instead to compare the estimatedArrival and fltRouteDest fields.

Output:
{"ARRIVALDATE":"2019-02-01T11:00:00Z","COUNT":2}

Example 6-16 Find the number of watchers for a particular show in a TV streaming application

SELECT count($a.contentStreamed[$element.showName = "Bienvenu"]) AS WATCHERS
FROM stream_acct $s, $s.acct_data $a

Explanation: In a TV streaming application, the list of shows watched by a subscriber is stored in the contentStreamed array. Each element of the array corresponds to a single show. The showName field includes the name of each show. To fetch the number of watchers for a particular show, you use the array-filter step expression in the count function. You check whether or not the showName field matches the given show (in this example, Bienvenu) for each subscriber and count the total number of such subscribers using the count function.

Output:
{"WATCHERS":4}

sum function

The sum function computes its input expression on each row in a group and sums up all the numeric values returned by these evaluations of the input expression. In fixed schema, the sum function returns an error if you try to sum any non-numeric field. In schema-less JSON, the sum function skips any non-numeric value. If all the values of the input expression are non-numeric, a NULL value is returned. The resulting value of the sum function has type long, double, or number, depending on the type of the input items:
  • If there is at least one input item of type number, the result will be a number.
  • If there is at least one item of type double or float, the result will be double, else the result will be of type long.
  • If the input items are a mix of long, doubles, and numbers, the result will be of type double.
  • If numeric values are not returned by the sum function’s input, the result is NULL.
Syntax:
long sum(any*)
Semantics:

any: The sum function accepts any parameter type as the input argument.

return type: long

Example 6-17 Find the total screen time spent on a show by a subscriber in the TV streaming application

SELECT sum($content.seriesInfo[].episodes[].minWatched) AS MINSWATCHED
FROM stream_acct $s, $s.acct_data.contentStreamed[] $content
WHERE acct_id = 1 AND $content.showName = "At the Ranch"

Explanation: In the TV streaming application, you can calculate the total screen time spent on a show by each subscriber. Each subscriber is associated with a unique account ID, stored in the acct_id field of the stream_acct table. The information of the shows is stored in the contentStreamed array. The showName field holds the name of the show and the minWatched field stores the time lapsed for each episode in each season of the show. In this query, you use the sum function to add the values of the minWatched fields of all the episodes in all the seasons to calculate the total screen time for the subscriber with account ID 1 and show named At the Ranch.

Output:
{"MINSWATCHED":225}

avg function

The avg (average) function computes its input expression on each row in a group and sums up as well as counts all the numeric values returned by these evaluations of the input expression. Any non-numeric values are skipped. An error message is returned if the input expression does not return any numeric values. The resulting value is the division of the sum by the count. This value has type double, or number, depending on the type of the input items.
  • If there is at least one input item of type number, the result will be a number, else the result will be double.
  • If numeric values are not returned by avg function's input, the result is NULL.
Syntax:
number avg(any*)
Semantics:

any: The avg function accepts any parameter type as the input argument.

return type: number

Example 6-18 Find the average screen time spent on a show by a subscriber

SELECT avg($content.seriesInfo[].episodes[].minWatched) AS AVERAGETIME
FROM stream_acct $s, $s.acct_data.contentStreamed[] $content
WHERE acct_id = 1 AND $content.showName = "At the Ranch"

Explanation: In the TV streaming application, you can calculate the average screen time spent on a show by each subscriber. Each subscriber is associated with a unique account ID, stored in the acct_id field of the stream_acct table. The information of the shows is stored in the contentStreamed array. The showName field holds the name of the series and the minWatched field stores the time lapsed for each episode in each season of the show. In this query, you use the avg function on the minWatched field to calculate the average screen time spent on the show named At the Ranch by a subscriber with account id 1. The avg function first calculates the total screen time by adding the values in the minWatched fields for the given show. It then divides the sum by the number of minWatched fields to calculate the average value.

Output:
{"AVERAGETIME":56.25}

min function

The min function returns the minimum value among all the values returned by the evaluations of the input expression on each row in a group. The input expression is evaluated as follows:
  1. An error is displayed if it can be determined during the compile time that the values returned by the input expression belong to a type for which an order comparison is not defined (for example, RECORD, MAP, BINARY, or FIXED_BINARY). Otherwise, the min value for each group is initialized to NULL.
  2. Let M be the current minimum value and N be the next input value. The M and N are compared using Value Comparison Operators. If M is NULL, M is set to N. Else, if N is less than M, that means N can be the minimum value and hence M is set to N, and N is set to the next input value. This is continued until all the values in the input expression are compared and a minimum value is ascertained. When the values are not comparable, the following order is used:
    numeric values < timestamps < strings and enums < booleans
    
  3. If N is a record, map, array, binary, or fixed binary value, NULL, or JSON null, it is skipped and the next input value is considered.
Syntax:
any_atomic min(any*)
Semantics:

any: The min function accepts any parameter type as the input argument.

return type: atomic data type

max function

The max function returns the maximum value in all the sequences returned by the evaluations of the input expression on each row in a group. The specific rules are the same as for the max function, except that the current max value M will be replaced by the next input value N if N is not skipped and is greater than M.

Syntax:
any_atomic max(any*)
Semantics:

any: The max function accepts any parameter type as the input argument.

return type: atomic data type

Example 6-19 For a given show, find the minimum and maximum screen time

SELECT min($content.seriesInfo[].episodes[].lengthMin) AS MINTIME
max($content.seriesInfo[].episodes[].lengthMin) AS MAXTIME
FROM stream_acct $s, $s.acct_data.contentStreamed[] $content
WHERE $content.showName = "At the Ranch"
Explanation: In the TV streaming application, you can find the minimum and maximum duration of a show. The lengthMin field in the stream_acct table stores the length of each episode for a show. In this query, you use:
  • The min function on the lengthMin field to fetch the duration of the episode from the show At the Ranch that has the least screen time.
  • The max function on the lengthMin field to fetch the duration of the episode from the show At the Ranch that has the most screen time.
Output:
{"MINTIME":45,"MAXTIME":85}

Example 6-20 Aggregate Function - Fetch the age and average income of users from the User data table

CREATE INDEX idx11 ON users (age);

SELECT 
age, count(*) AS count, avg(income) AS income 
FROM users 
GROUP BY age;

Explanation: Consider an application that maintains the user data. See User data application table in the Tables used in the Examples section. The users table includes multiple records of users with the same age group. You use the aggregate functions to retrieve the count of such subscribers and their average income. The above query groups users by their age, and for each age, returns the number of users with the same age group and their average income.

Output:
{"age":22,"count":2,"income":50000.0}
{"age":45,"count":1,"income":75000.0}