Example 6: Group the data using a Covering index

Fetch the state ,city and sum of income of all users grouped by the state and city.
SELECT u.address.state, u.address.city, sum(income) 
AS income FROM Users u GROUP BY u.address.state, u.address.city
Query execution plan:
{
   "iterator kind" : "SELECT",
   "FROM" :
   {
      "iterator kind" : "RECEIVE",
      "distribution kind" : "ALL_SHARDS",
      "order by fields at positions" : [ 0, 1 ],
      "input iterator" :
      {
         "iterator kind" : "SELECT",
         "FROM" :
         {
             "iterator kind" : "TABLE",
             "target table" : "Users",
             "row variable" : "$$u",
             "index used" : "idx_state_city_income",
             "covering index" : true,
             "index row variable" : "$$u_idx",
             "index scans" : [
                {
                    "equality conditions" : {},
                    "range conditions" : {}
                }
            ]
          },
          "FROM variable" : "$$u_idx",
          "GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
          "SELECT expressions" : [
             {
                "field name" : "state",
                "field expression" :
                {
                   "iterator kind" : "FIELD_STEP",
                   "field name" : "address.state",
                   "input iterator" :
                   {
                      "iterator kind" : "VAR_REF",
                      "variable" : "$$u_idx"
                   }
               } 
             },
             {
                "field name" : "city",
                "field expression" :
                {
                   "iterator kind" : "FIELD_STEP",
                   "field name" : "address.city",
                   "input iterator" :
                   {
                      "iterator kind" : "VAR_REF",
                      "variable" : "$$u_idx"
                   }
                }
             },
             {
                "field name" : "income",
                "field expression" :
                {
                   "iterator kind" : "FUNC_SUM",
                   "input iterator" :
                   {
                      "iterator kind" : "FIELD_STEP",
                      "field name" : "income",
                      "input iterator" :
                      {
                         "iterator kind" : "VAR_REF",
                         "variable" : "$$u_idx"
                      }
                   }
                }
            }
       ]
    }
 },
 "FROM variable" : "$from-1",
 "GROUP BY" : "Grouping by the first 2 expressions in the SELECT list",
 "SELECT expressions" : [
    {
       "field name" : "state",
       "field expression" :
       {
          "iterator kind" : "FIELD_STEP",
          "field name" : "state",
          "input iterator" :
          {
             "iterator kind" : "VAR_REF",
             "variable" : "$from-1"
          }
       }
    },
    {
       "field name" : "city",
       "field expression" :
       {
          "iterator kind" : "FIELD_STEP",
          "field name" : "city",
          "input iterator" :
          {
             "iterator kind" : "VAR_REF",
             "variable" : "$from-1"
          }
       }
    },
    {
       "field name" : "income",
       "field expression" :
       {
          "iterator kind" : "FUNC_SUM",
          "input iterator" :
          {
             "iterator kind" : "FIELD_STEP",
             "field name" : "income",
             "input iterator" :
             {
                "iterator kind" : "VAR_REF",
                "variable" : "$from-1"
             }
           }
        }
     }
  ]
}
Explanation of the query execution plan :
  • In this example, you group the users based on state and city and then you determine the sum of income of the users grouped.
  • The group-by is index-based, that is the group by fields are also part of the index used. This is indicated by the lack of any GROUP iterators. Instead, the grouping is done by the SELECT iterators.
  • There are two SELECT iterators, the inner one has a GROUP BY property that specifies which of the SELECT-clause expressions are also grouping expressions. Here the group by fields are the first 2 expressions in the SELECT list (u.address.state,u.address.city).
    "GROUP BY" : "Grouping by the first 2 expressions in the SELECT list"
  • The index idx_state_city_income is used here and in this example, it is a covering index as the query can be evaluated using only the entries of the index.
  • The index row variable is $$u_idx, which is the name of a variable ranging over the index entries produced by the TABLE iterator. Every time a new index entry is produced by the index scan, the $$u_idx variable will be bound to that entry.
  • This index row variable ($$u_idx) will be referenced by iterators implementing the other clauses of the SELECT expression.
  • In the SELECT expression, three fields (state, city and sum(income)) are fetched. These correspond to three field names and field expressions in the SELECT expression clause.
  • The results returned by the inner SELECT iterators from the various RNs are partial groups, because rows with the same state and city may exist at multiple RNs. So, regrouping and re-aggregation has to be performed at the driver. This is done by the outer SELECT iterator (above the RECEIVE iterator).
  • The result is also sorted by state and city. The order by fields at positions property specifies the field used for sorting. The value of this array depends on the position of the field that is sorted in the SELECT expression. In this example, state is the first field and city is the second field in the SELECT expression. So order by fields at positions has a value of 0,1.
    "order by fields at positions" : [ 0, 1 ]
  • In the outer SELECT expression, three fields are fetched: state, cityand sum(income). The FROM variable$from-1 will be referenced by iterators implementing the other clauses of the outer SELECT expression. This corresponds to three field names and field expressions in the outer SELECT expression clause. In this example, two of the field expressions fetch only the fields and there is one field expression which evaluates a function (sum(income)).
  • For the two fields to be directly fetched by the SELECT expression, the field expression is computed by a FIELD_STEP iterator. The FIELD_STEP iterator extracts and returns the value of a field (state as shown below) from the records produced by its input iterator. The same is done for the city field.
    {
       "field name" : "state",
       "field expression" :
       {
          "iterator kind" : "FIELD_STEP",
          "field name" : "state",
          "input iterator" :
          {
             "iterator kind" : "VAR_REF",
             "variable" : "$from-1"
          }
       }
    }
    The third field in the SELECT expression is a function to determine the sum of income. The FUNC_SUM iterator is used for this. It iterates over the value of the income field and determines the sum of all incomes from the result of its input iterator.
    {
       "field name" : "income",
       "field expression" :
       {
          "iterator kind" : "FUNC_SUM",
          "input iterator" :
          {
             "iterator kind" : "FIELD_STEP",
             "field name" : "income",
             "input iterator" :
             {
                "iterator kind" : "VAR_REF",
                "variable" : "$from-1"
             }
          }
       }
    }