GROUP BY Clause

The GROUP BY clause is used in a SELECT statement to collect data across multiple rows and group the result by one or more columns or expressions. The GROUP BY clause is often used with aggregate functions. Oracle NoSQL Database applies the aggregate functions to each group of rows and returns a single row for each group.

Syntax

groupby_clause ::= GROUP BY expression ("," expression)*

Semantics

Each (grouping) expression must return at most one atomic value. If a grouping expression returns an empty result on an input row, that row is skipped. Equality among grouping values is defined according to the semantics of the "=" operator, with the exception that two NULL values are considered equal. See Value Comparison Operators section. Then, for each group, a single record is constructed and returned by the GROUP BY clause. If the clause has N grouping expressions, the first N fields of the returned record store the values of the grouping expressions. The remaining M fields (M >= 0) store the result of zero or more aggregate functions. In general, aggregate functions iterate over the rows of a group, evaluate an expression for each such row, and aggregate the returned values into a single value per group. Oracle NoSQL Database supports many aggregate functions as described in the Using Aggregate Functions section.

Syntactically, aggregate functions are not actually listed in the GROUP BY clause, but appear in the SELECT clause instead. In fact, aggregate functions can appear only in the SELECT or ORDER BY clauses, and they cannot be nested. Semantically, however, every aggregate function that appears in the SELECT or ORDER BY list is actually evaluated by the GROUP BY clause. If the SELECT clause contains any aggregate functions, but the SELECT expression does not contain a GROUP BY clause, the whole set of rows produced by the FROM or the WHERE clauses is considered as one group and the aggregate functions are evaluated over this single group.

The implementation of the GROUP BY clause may be index-based or generic. Index-based grouping is possible only if an index exists that sorts the rows by the values of the grouping expressions. More precisely, let e1, e2, …, eN where ei is the ith expression (i is a number in the range 1,2,3, ...N) be the grouping expressions as they appear in the GROUP BY clause (from left to right). Then, for index-based grouping, there must exist an index (which may be the primary-key index or one of the existing secondary indexes) such that for each i in 1,2,...,N, ei matches the definition of the i-th index field. If such an index does not exist or is not selected by the query optimizer, the GROUP BY will be generic. A generic GROUP BY uses a hash table to find rows belonging to the same group and stores all groups before returning any results to the application. The hash table is stored in the client driver memory (local hash tables, of limited size, may be used at the servers as well). As a result, a generic GROUP BY may consume a large amount of driver memory. In contrast, index-based grouping exploits the row sorting provided by the index to avoid the materialization and caching of any intermediate results. It is therefore recommended to create appropriate indexes for use in GROUP BY queries. See Using Indexes for Query Optimization. Finally, notice that when you use index-based grouping, the results of a grouping SELECT expression are ordered by the grouping expressions.

Example 6-4 GROUP BY Clause

This query groups users by their age and for each group returns the associated age and the average income of the users in the group. Grouping is possible only if there is a secondary index on the age column (or more generally, a multi-column index whose first column is the age column).

SELECT 
age, 
count(*) AS count, 
avg(income) AS income 
FROM users 
GROUP BY age;