Example 7: Group data with fields not part of the index
Fetch the age and sum of income of all users whose state is CA, grouping the
                        data by
                        age.
               SELECT age, sum(income) FROM Users u 
WHERE u.address.state = "CA" GROUP BY ageQuery execution
                        plan:
               {
  "iterator kind" : "GROUP",
  "input variable" : "$gb-2",
  "input iterator" :
  {
    "iterator kind" : "RECEIVE",
    "distribution kind" : "ALL_SHARDS",
    "input iterator" :
    {
      "iterator kind" : "GROUP",
      "input variable" : "$gb-1",
      "input iterator" :
      {
        "iterator kind" : "SELECT",
        "FROM" :
        {
          "iterator kind" : "TABLE",
          "target table" : "users",
          "row variable" : "$$u",
          "index used" : "idx_state_city_income",
          "covering index" : false,
          "index scans" : [
            {
              "equality conditions" : {"address.state":"CA"},
              "range conditions" : {}
            }
          ]
        },
        "FROM variable" : "$$u",
        "SELECT expressions" : [
          {
            "field name" : "age",
            "field expression" :
            {
              "iterator kind" : "FIELD_STEP",
              "field name" : "age",
              "input iterator" :
              {
                "iterator kind" : "VAR_REF",
                "variable" : "$$u"
              }
            }
          },
          {
            "field name" : "Column_2",
            "field expression" :
            {
              "iterator kind" : "FIELD_STEP",
              "field name" : "income",
              "input iterator" :
              {
                "iterator kind" : "VAR_REF",
                "variable" : "$$u"
              }
            }
          }
        ]
      },
      "grouping expressions" : [
        {
          "iterator kind" : "FIELD_STEP",
          "field name" : "age",
          "input iterator" :
          {
            "iterator kind" : "VAR_REF",
            "variable" : "$gb-1"
          }
        }
      ],
      "aggregate functions" : [
        {
          "iterator kind" : "FUNC_SUM",
          "input iterator" :
          {
            "iterator kind" : "FIELD_STEP",
            "field name" : "Column_2",
            "input iterator" :
            {
              "iterator kind" : "VAR_REF",
              "variable" : "$gb-1"
            }
          }
        }
      ]
    }
  },
  "grouping expressions" : [
    {
      "iterator kind" : "FIELD_STEP",
      "field name" : "age",
      "input iterator" :
      {
        "iterator kind" : "VAR_REF",
        "variable" : "$gb-2"
      }
    }
  ],
  "aggregate functions" : [
    {
      "iterator kind" : "FUNC_SUM",
      "input iterator" :
      {
        "iterator kind" : "FIELD_STEP",
        "field name" : "Column_2",
        "input iterator" :
        {
          "iterator kind" : "VAR_REF",
          "variable" : "$gb-2"
        }
      }
    }
  ]
}Explanation of the query execution plan :
            - In this example , you group all users in the state CA based on their age and determine the sum of the income of users belonging to each age group.
- As the GROUP BY field (agein this example) is not part of any index, you need a separate GROUP operator to do the grouping. This is indicated by the existence of the GROUP iterators in the execution plan. There are two GROUP iterators: one that operates at the driver (above the RECEIVE iterator) and another that operates at the RNs (below the RECEIVE iterator).
- The lower GROUP iterator has a SELECT iterator as input. The
                                        SELECT returns the age and income of each user in the state
                                                CA. The GROUP iterator will operate until the
                                        batch limit is reached. If the batch limit is defined as the
                                        max number N of results produced, the GROUP iterator will
                                        stop when up to N age groups have been created. If the batch
                                        limit is defined as the max number of bytes read, it will
                                        stop when this max is reached. The GROUP operator has an
                                        input variable. For the inner GROUP operator the input
                                        variable is $gb-1and for the outer GROUP operator it is$gb-2."iterator kind" : "GROUP","input variable" : "$gb-1",
- The index idx_state_city_incomeis used here and in this example, it is not a covering index as the query has theagefield, which is not part of the entries of the index.
- The index scan property contains the start and stop conditions
                                        that define the index scans to be performed. In this
                                        example, only one index scan will be performed. The
                                        conditions correspond to the predicate u.address.state =
                                        "CA". Specifically, the starting index entry must have the
                                        value CA on the address.statefield. All subsequent entries must have CA as the value of theiraddress.statefield, and the scan will stop as soon as an entry with a different state value is encountered.
- When the FROM iterator is a TABLE iterator, the FROM variable is the same as either the index row variable or the row variable of the TABLE iterator, depending on whether the used index is covering or not. In this example, the FROM variable is same as the row variable as the index is not covering.
- This row variable ($$u) will be referenced by iterators implementing the other clauses of the inner SELECT expression.
- The GROUP iterator creates an internal variable
                                                  ($gb-1) that iterates over the records produced by the SELECT expression.
- The result set produced by the lower GROUP iterator is partial: it may not contain all the age groups and for the age groups that it does contain, the income may be a partial sum (because all rows for a given age may not have been retrieved when query execution stops). The upper GROUP iterator receives the partial results from each RN and performs the final grouping and aggregation. It operates the same way as the lower GROUP iterators and will keep operating until there are no more partial results from the RNs. At that point, the full and final result set is cached at the upper GROUP iterator and is returned to the application.
- The upper GROUP iterator creates an internal variable
                                                  ($gb-2) that iterates over the records produced by the outer SELECT expression. The $gb-2 variable has the age and sum of income of all users of the CA state, grouped by age.
- In the SELECT expression, two fields are fetched:
                                                ageandsum(income). These correspond to two field names and field expressions in the SELECT expression clause.
- For the agefield, the field expression is computed by a FIELD_STEP iterator. The FIELD_STEP iterator extracts and returns the value of the age field from the records produced by its input iterator.{ "iterator kind" : "FIELD_STEP", "field name" : "age", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$gb-2" } }
- The second field to be fetched is an aggregate function
                                                  sumof income values. TheFUNC_SUMiterator is used for this. It iterates over the value of theincomefield and determines the sum of all incomes from the result of its input iterator."aggregate functions" : [ { "iterator kind" : "FUNC_SUM", "input iterator" : { "iterator kind" : "FIELD_STEP", "field name" : "Column_2", "input iterator" : { "iterator kind" : "VAR_REF", "variable" : "$gb-2" } } } ]