Sequence Aggregate Functions

Sequence aggregate functions simply aggregate the items in their input sequence, using the same rules as their corresponding SQL aggregate function.

For example, seq_sum() will skip any non-numeric items in the input sequence and it will determine the actual type of the return value (long, double, or number) the same way as the SQL sum(). The only exception is seq_count(), which contrary to the SQL count(), will return NULL if any of its input items is NULL. Furthermore, there are no restrictions on where sequence aggregate functions can appear (for example, they can be used in the WHERE and/or the SELECT clause).

Note:

Note: An array is a sequence of one item. To aggregate the elements of an array, you must unbox the array using [].

The following sequence aggregate functions are supported.

  • long seq_count(any*)
  • number seq_sum(any*)
  • number seq_avg(any*)
  • any_atomic seq_min(any*)
  • any_atomic seq_max(any*)

Note:

All sequence aggregate function names are case sensitive.

seq_count function

Returns the number of items in the input sequence. The evaluation of the input expression is similar to the count function.

Syntax:
long seq_count(any*)
Semantics:

any: The seq_count function accepts any parameter type as the input argument.

return type: long

seq_sum function

Returns the sum of the numeric items in the input sequence. The evaluation of the input expression is similar to the sum function.

Syntax:
number seq_sum(any*)
Semantics:

any: The seq_sum function accepts any parameter type as the input argument.

return type: number

seq_avg function

Returns the average of the numeric items in the input sequence. The evaluation of the input expression is similar to the avg function.

Syntax:
number seq_avg(any*)
Semantics:

any: The seq_avg function accepts any parameter type as the input argument.

return type: number

seq_min function

Returns the minimum of the items in the input sequence. The evaluation of the input expression is similar to the min function.

Syntax:
any_atomic seq_min(any*)
Semantics:

any: The seq_min function accepts any parameter type as the input argument.

return type: atomic data type

seq_max function

Returns the maximum of the items in the input sequence. The evaluation of the input expression is similar to the max function.

Syntax:
any_atomic seq_max(any*)
Semantics:

any: The seq_max function accepts any parameter type as the input argument.

return type: atomic data type

Example 6-21 Display an automated message regarding the number of checked bags, travel route, and flight count to a passenger in the airline baggage tracking application

SELECT fullName,
b.baginfo[0].routing,
size(baginfo) AS BAGS,
    CASE
        WHEN seq_count(b.bagInfo[0].flightLegs.flightNo) = 1
        THEN "You have one flight to catch"
        WHEN seq_count(b.bagInfo[0].flightLegs.flightNo) = 2
        THEN "You have two flights to catch"
        WHEN seq_count(b.bagInfo[0].flightLegs.flightNo) = 3
        THEN "You have three flights to catch"
        ELSE "You do not have any travel listed today"
    END AS FlightInfo
FROM BaggageInfo b
WHERE ticketNo = 1762320369957

Explanation: In the Example 5-2, it is helpful to display a quick look-up message regarding the flight count, number of checked bags, and routing details of an upcoming travel for a passenger. The bagInfo array holds the checked bag details of the passenger. The size of the bagInfo array determines the number of checked bags per passenger. The flightLegs array in the bagInfo includes the flight details corresponding to each travel leg. The routing field includes the airport codes of all the travel fragments. You can determine the number of flights by counting the flightNo fields in the flightLegs array. If a passenger has more than one checked bag, there will be more than one element in the bagInfo array, one for each bag. In such cases, the flightLegs array in all the elements of the bagInfo field of a passenger data will contain the same values. This is because the destination of all the checked bags for a passenger will be the same. While counting the flightNo fields, you must consider only one element of the bagInfo array to avoid duplication of results. In this query, you consider only the first element, that is, bagInfo[0]. As the flightLegs array has a flightNo field for each travel fragment, it is a sequence and you determine the count of the flightNo fields per passenger using the seq_count function.

You use the CASE statement to introduce different messages based on the flight count. For ease of use, only three transits are considered in the query.

Output:
{"fullName":"Lorenzo Phil","routing":"SFO/IST/ATH/JTR","BAGS":2,"FlightInfo":"You have three flights to catch"}

Example 6-22 Retrieve the step tracker details for a user from the User data application

Consider the users table with an additional field stepCount to track the steps covered by each user over a duration. The table schema is as follows:

CREATE TABLE users ( id INTEGER,
                     firstName STRING,
                     lastName STRING,
                     otherNames ARRAY(RECORD(first STRING, last STRING)),
                     age INTEGER,
                     income INTEGER,
                     address JSON,
                     connections ARRAY(INTEGER),
                     stepCount ARRAY(LONG),
PRIMARY KEY (id) )

Insert the following sample data:

INSERT INTO users VALUES ( 

   10,   
   "John",
   "Smith",
   [ {"first" : "Johny", "last" : "Good"}, {"first" : "Johny2", "last" : "Brave"},{"first" : "Johny3", "last" : "Kind"},{"first" : "Johny4", "last" : "Humble"} ],
   22,
   45000,
   {
      "street" : "Pacific Ave",
      "number" : 101,
      "city" : "Santa Cruz",
      "state" : "CA",
      "zip" : 95008,
      "phones" : [
         { "area" : 408, "number" : 4538955, "kind" : "work" },
         { "area" : 831, "number" : 7533341, "kind" : "home" },
         { "area" : 831, "number" : 7533382, "kind" : "mobile" }
      ]
   },
   [ 30, 55, 43 ],
   [ 2000, 1500, 2700, 3000, 1000, 4000, 6000]
)

Based on the aggregation of this data for a given duration, a user can chart out a fitness regime.

SELECT id,
firstName,
seq_count(u.stepCount[]) AS DAYS,
seq_sum(u.stepCount[]) AS TOTAL_STEPS,
seq_avg(u.stepCount[]) AS AVERAGE_STEPS,
seq_min(u.stepCount[]) AS LOWEST,
seq_max(u.stepCount[]) AS HIGHEST
FROM users u
WHERE id=10

Explanation: The stepCount field in the users table is an array of type long. Each element in the array represents the number of steps covered by a user per day. You can use the sequence aggregate functions to fetch the details of total steps, average steps, lowest, and highest counts of steps covered by the user over a period.

Output:
{"id":10,"firstName":"John","DAYS":7,"TOTAL_STEPS":20200,"AVERAGE_STEPS":2885.714285714286,"LOWEST":1000,"HIGHEST":6000}