Example 6: Group the data using a Covering index
Fetch the state ,city and sum of income of all users grouped by the state and
city.
SELECT u.address.state, u.address.city, sum(income)
AS income FROM Users u GROUP BY u.address.state, u.address.city
Query execution
plan:
{
"iterator kind" : "SELECT",
"FROM" :
{
"iterator kind" : "RECEIVE",
"distribution kind" : "ALL_SHARDS",
"order by fields at positions" : [ 0, 1 ],
"input iterator" :
{
"iterator kind" : "SELECT",
"FROM" :
{
"iterator kind" : "TABLE",
"target table" : "Users",
"row variable" : "$$u",
"index used" : "idx_state_city_income",
"covering index" : true,
"index row variable" : "$$u_idx",
"index scans" : [
{
"equality conditions" : {},
"range conditions" : {}
}
]
},
"FROM variable" : "$$u_idx",
"GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
"SELECT expressions" : [
{
"field name" : "state",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "address.state",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
},
{
"field name" : "city",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "address.city",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
},
{
"field name" : "income",
"field expression" :
{
"iterator kind" : "FUNC_SUM",
"input iterator" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "income",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
}
}
]
}
},
"FROM variable" : "$from-1",
"GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
"SELECT expressions" : [
{
"field name" : "state",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "state",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
},
{
"field name" : "city",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "city",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
},
{
"field name" : "income",
"field expression" :
{
"iterator kind" : "FUNC_SUM",
"input iterator" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "income",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
}
}
]
}
Explanation of the query execution plan :
- In this example, you group the users based on
state
andcity
and then you determine the sum ofincome
of the users grouped. - The group-by is index-based, that is the group by fields are
also part of the index used. This is indicated by the lack
of any GROUP iterators. Instead, the grouping is done by the
SELECT
iterators. - There are two SELECT iterators, the inner one has a
GROUP BY
property that specifies which of the SELECT-clause expressions are also grouping expressions. Here the group by fields are the first 2 expressions in the SELECT list (u.address.state,u.address.city
)."GROUP BY" : "Grouping by the first 2 expressions in the SELECT list"
- The index
idx_state_city_income
is used here and in this example, it is a covering index as the query can be evaluated using only the entries of the index. - The index row variable is
$$u_idx
, which is the name of a variable ranging over the index entries produced by the TABLE iterator. Every time a new index entry is produced by the index scan, the$$u_idx
variable will be bound to that entry. - This index row variable (
$$u_idx
) will be referenced by iterators implementing the other clauses of the SELECT expression. - In the SELECT expression, three fields
(
state
,city
andsum(income)
) are fetched. These correspond to three field names and field expressions in the SELECT expression clause. - The results returned by the inner SELECT iterators from the various RNs are partial groups, because rows with the same state and city may exist at multiple RNs. So, regrouping and re-aggregation has to be performed at the driver. This is done by the outer SELECT iterator (above the RECEIVE iterator).
- The result is also sorted by
state
andcity
. Theorder by fields at positions
property specifies the field used for sorting. The value of this array depends on the position of the field that is sorted in the SELECT expression. In this example,state
is the first field andcity
is the second field in the SELECT expression. Soorder by fields at positions
has a value of0,1
."order by fields at positions" : [ 0, 1 ]
- In the outer SELECT expression, three fields are fetched:
state
,city
andsum(income)
. TheFROM
variable$from-1
will be referenced by iterators implementing the other clauses of the outer SELECT expression. This corresponds to three field names and field expressions in the outer SELECT expression clause. In this example, two of the field expressions fetch only the fields and there is one field expression which evaluates a function (sum(income)
). - For the two fields to be directly fetched by the SELECT expression, the
field expression is computed by a FIELD_STEP iterator. The FIELD_STEP iterator
extracts and returns the value of a field (
state
as shown below) from the records produced by its input iterator. The same is done for thecity
field.
The third field in the SELECT expression is a function to determine the sum of income. The{ "field name" : "state", "field expression" : { "iterator kind" : "FIELD_STEP", "field name" : "state", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$from-1" } } }
FUNC_SUM
iterator is used for this. It iterates over the value of theincome
field and determines the sum of all incomes from the result of its input iterator.{ "field name" : "income", "field expression" : { "iterator kind" : "FUNC_SUM", "input iterator" : { "iterator kind" : "FIELD_STEP", "field name" : "income", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$from-1" } } } }