GROUP/GROUP BY clauses

The GROUP and GROUP BY clauses specify how to map source records to result records in order to group statement output.

Some of the ways to use these clauses in a query are:
  • Omitting the GROUP clause maps each source record to its own result record.
  • GROUP maps all source records to a single result record.
  • GROUP BY <attributeList> maps source records to result records by the combination of values in the listed attributes.

You can also use other grouping functions (such as MEMBERS, CUBE, or GROUPING SETS) with the GROUP and GROUP BY clauses. Details on these functions are given later in this section.

BNF grammar for grouping

The BNF grammar representation for GROUP and the family of group functions is:
GroupClause ::= GROUP | GROUP BY GroupByList | GROUP BY GroupAll
GroupByList ::= GroupByElement | GroupByList ,  GroupByElement
GroupByElement ::= GroupBySingle | GroupingSets | CubeRollup

GroupingSets ::= GROUPING SETS (GroupingSetList)
GroupingSetList ::=  GroupingSetElement |  GroupingSetList ,  GroupingSetElement
GroupingSetElement ::=  GroupBySingle | GroupByComposite | CubeRollup | GroupAll

CubeRollup ::= {CUBE | ROLLUP} (CubeRollupList)
CubeRollupList ::=  CubeRollupElement |  CubeRollupList ,  CubeRollupElement
CubeRollupElement ::=  GroupBySingle | GroupByComposite

GroupBySingle ::= Identifier | GroupByMembers
GroupByComposite ::= (GroupByCompositeList)
GroupByCompositeList ::= GroupBySingle | GroupByCompositeList, GroupBySingle
GroupByMembers ::= MEMBERS (Identifier | Identifier.Identifier) AS Identifier

GroupAll ::= ()
Note that the use of GroupAll results in the following being all equivalent:
GROUP = GROUP BY() = GROUP BY GROUPING SETS(())

Specifying only GROUP

You can use a GROUP clause to aggregate results into a single bucket. As the BNF grammar shows, the GROUP clause does not take an argument.

For example, the following statement uses the SUM statement to return a single sum across a set of records:
RETURN ReviewCount AS 
SELECT SUM(NumReviews) AS NumberOfReviews
FROM ProductState
GROUP

This statement returns one record for NumberOfReviews. The value is the sum of the values for the NumReviews attribute.

Specifying GROUP BY

You can use GROUP BY to aggregate results into buckets with common values for the grouping keys. The GROUP BY syntax is:
GROUP BY attributeList
where attributeList is a single attribute, a comma-separated list of multiple attributes, GROUPING SETS, CUBE, ROLLUP, or () to specify an empty group. The empty group generates a total.

Grouping is allowed on source and locally-defined attributes.

Note:

If you group by a locally-defined attribute, that attribute cannot refer to non-grouping attributes and cannot contain any aggregates. However, IN expressions and lookup expressions are valid in this context.

All grouping attributes are part of the result records. In any grouping attribute, NULL values (for single-assign attributes) or empty sets (for multi-assign attributes) are treated like any other value, which means the source record is mapped to result records. For information about user-defined NULL-value handling in EQL, see COALESCE.

For example, suppose we have sales transaction data with records consisting of the following attributes:
{ TransId, ProductType, Amount, Year, Quarter, Region,
  SalesRep, Customer }
For example:
{ TransId = 1, ProductType = "Widget", Amount = 100.00,
  Year = 2011, Quarter = "11Q1", Region  = "East",
  SalesRep = "J. Smith", Customer = "Customer1" }
If an EQL statement uses Region and Year as GROUP BY attributes, the statement results contain an aggregated record for each valid, non-empty combination of Region and Year. In EQL, this example is expressed as:
DEFINE RegionsByYear AS
GROUP BY Region, Year
resulting in the aggregates of the form { Region, Year }, for example:
{ "East", "2010" }
{ "West", "2011" }
{ "East", "2011" }
Note that using duplicated columns in GROUP BY clauses is allowed. This means that the following two queries are treated as equivalent:
RETURN Results AS
SELECT SUM(PROMO_COST) AS PR_Cost
FROM SaleState
GROUP BY PROMO_NAME

RETURN Results AS
SELECT SUM(PROMO_COST) AS PR_Cost
FROM SaleState
GROUP BY PROMO_NAME, PROMO_NAME

Using a GROUP BY that is an output of a SELECT expression

A GROUP BY key can be the output of a SELECT expression, as long as that expression itself does not contain an aggregation function.

For example, the following syntax is a correct usage of GROUP BY:
SELECT COALESCE(Person, 'Unknown Person') AS Person2, ... GROUP BY Person2
The following syntax is incorrect and results in an error, because Sales2 contains an aggregation function (SUM):
SELECT SUM(Sales) AS Sales2, ... GROUP BY Sales2