Example 6: Group the data using a Covering index
Fetch the state ,city and sum of income of all users grouped by the state and
city.
SELECT u.address.state, u.address.city, sum(income)
AS income FROM Users u GROUP BY u.address.state, u.address.cityQuery execution
plan:
{
"iterator kind" : "SELECT",
"FROM" :
{
"iterator kind" : "RECEIVE",
"distribution kind" : "ALL_SHARDS",
"order by fields at positions" : [ 0, 1 ],
"input iterator" :
{
"iterator kind" : "SELECT",
"FROM" :
{
"iterator kind" : "TABLE",
"target table" : "Users",
"row variable" : "$$u",
"index used" : "idx_state_city_income",
"covering index" : true,
"index row variable" : "$$u_idx",
"index scans" : [
{
"equality conditions" : {},
"range conditions" : {}
}
]
},
"FROM variable" : "$$u_idx",
"GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
"SELECT expressions" : [
{
"field name" : "state",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "address.state",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
},
{
"field name" : "city",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "address.city",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
},
{
"field name" : "income",
"field expression" :
{
"iterator kind" : "FUNC_SUM",
"input iterator" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "income",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$$u_idx"
}
}
}
}
]
}
},
"FROM variable" : "$from-1",
"GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
"SELECT expressions" : [
{
"field name" : "state",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "state",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
},
{
"field name" : "city",
"field expression" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "city",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
},
{
"field name" : "income",
"field expression" :
{
"iterator kind" : "FUNC_SUM",
"input iterator" :
{
"iterator kind" : "FIELD_STEP",
"field name" : "income",
"input iterator" :
{
"iterator kind" : "VAR_REF",
"variable" : "$from-1"
}
}
}
}
]
}Explanation of the query execution plan :
- In this example, you group the users based on
stateandcityand then you determine the sum ofincomeof the users grouped. - The group-by is index-based, that is the group by fields are
also part of the index used. This is indicated by the lack
of any GROUP iterators. Instead, the grouping is done by the
SELECTiterators. - There are two SELECT iterators, the inner one has a
GROUP BYproperty that specifies which of the SELECT-clause expressions are also grouping expressions. Here the group by fields are the first 2 expressions in the SELECT list (u.address.state,u.address.city)."GROUP BY" : "Grouping by the first 2 expressions in the SELECT list" - The index
idx_state_city_incomeis used here and in this example, it is a covering index as the query can be evaluated using only the entries of the index. - The index row variable is
$$u_idx, which is the name of a variable ranging over the index entries produced by the TABLE iterator. Every time a new index entry is produced by the index scan, the$$u_idxvariable will be bound to that entry. - This index row variable (
$$u_idx) will be referenced by iterators implementing the other clauses of the SELECT expression. - In the SELECT expression, three fields
(
state,cityandsum(income)) are fetched. These correspond to three field names and field expressions in the SELECT expression clause. - The results returned by the inner SELECT iterators from the various RNs are partial groups, because rows with the same state and city may exist at multiple RNs. So, regrouping and re-aggregation has to be performed at the driver. This is done by the outer SELECT iterator (above the RECEIVE iterator).
- The result is also sorted by
stateandcity. Theorder by fields at positionsproperty specifies the field used for sorting. The value of this array depends on the position of the field that is sorted in the SELECT expression. In this example,stateis the first field andcityis the second field in the SELECT expression. Soorder by fields at positionshas a value of0,1."order by fields at positions" : [ 0, 1 ] - In the outer SELECT expression, three fields are fetched:
state,cityandsum(income). TheFROMvariable$from-1will be referenced by iterators implementing the other clauses of the outer SELECT expression. This corresponds to three field names and field expressions in the outer SELECT expression clause. In this example, two of the field expressions fetch only the fields and there is one field expression which evaluates a function (sum(income)). - For the two fields to be directly fetched by the SELECT expression, the
field expression is computed by a FIELD_STEP iterator. The FIELD_STEP iterator
extracts and returns the value of a field (
stateas shown below) from the records produced by its input iterator. The same is done for thecityfield.
The third field in the SELECT expression is a function to determine the sum of income. The{ "field name" : "state", "field expression" : { "iterator kind" : "FIELD_STEP", "field name" : "state", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$from-1" } } }FUNC_SUMiterator is used for this. It iterates over the value of theincomefield and determines the sum of all incomes from the result of its input iterator.{ "field name" : "income", "field expression" : { "iterator kind" : "FUNC_SUM", "input iterator" : { "iterator kind" : "FIELD_STEP", "field name" : "income", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$from-1" } } } }