Query 10: Group data with fields not part of any index

Fetch the source of passenger bags and the count of bags for all passengers and group the data by the source.
SELECT $flt_src as SOURCE, count(*) as COUNT FROM BaggageInfo $bag,
$bag.bagInfo.flightLegs[0].fltRouteSrc $flt_src GROUP BY $flt_src
Plan:
{
  "iterator kind" : "GROUP",
  "input variable" : "$gb-2",
  "input iterator" :
  {
    "iterator kind" : "RECEIVE",
    "distribution kind" : "ALL_PARTITIONS",
    "input iterator" :
    {
      "iterator kind" : "GROUP",
      "input variable" : "$gb-1",
      "input iterator" :
      {
        "iterator kind" : "SELECT",
        "FROM" :
        {
          "iterator kind" : "TABLE",
          "target table" : "BaggageInfo",
          "row variable" : "$bag",
          "index used" : "primary index",
          "covering index" : false,
          "index scans" : [
            {
              "equality conditions" : {},
              "range conditions" : {}
            }
          ]
        },
        "FROM variable" : "$bag",
        "FROM" :
        {
          "iterator kind" : "FIELD_STEP",
          "field name" : "fltRouteSrc",
          "input iterator" :
          {
            "iterator kind" : "ARRAY_SLICE",
            "low bound" : 0,
            "high bound" : 0,
            "input iterator" :
            {
              "iterator kind" : "FIELD_STEP",
              "field name" : "flightLegs",
              "input iterator" :
              {
                "iterator kind" : "FIELD_STEP",
                "field name" : "bagInfo",
                "input iterator" :
                {
                  "iterator kind" : "VAR_REF",
                  "variable" : "$bag"
                }
              }
            }
          }
        },
        "FROM variable" : "$flt_src",
        "SELECT expressions" : [
          {
            "field name" : "SOURCE",
            "field expression" :
            {
              "iterator kind" : "VAR_REF",
              "variable" : "$flt_src"
            }
          },
          {
            "field name" : "COUNT",
            "field expression" :
            {
              "iterator kind" : "CONST",
              "value" : 1
            }
          }
        ]
      },
      "grouping expressions" : [
        {
          "iterator kind" : "FIELD_STEP",
          "field name" : "SOURCE",
          "input iterator" :
          {
            "iterator kind" : "VAR_REF",
            "variable" : "$gb-1"
          }
        }
      ],
      "aggregate functions" : [
        {
          "iterator kind" : "FUNC_COUNT_STAR"
        }
      ]
    }
  },
  "grouping expressions" : [
    {
      "iterator kind" : "FIELD_STEP",
      "field name" : "SOURCE",
      "input iterator" :
      {
        "iterator kind" : "VAR_REF",
        "variable" : "$gb-2"
      }
    }
  ],
  "aggregate functions" : [
    {
      "iterator kind" : "FUNC_SUM",
      "input iterator" :
      {
        "iterator kind" : "FIELD_STEP",
        "field name" : "COUNT",
        "input iterator" :
        {
          "iterator kind" : "VAR_REF",
          "variable" : "$gb-2"
        }
      }
    }
  ]
}
Explanation:
  • In this query, you group passenger bags based on the flight source and determine the total number of bags belonging to one flight source.
  • As the GROUP BY field (bagInfo.flightLegs[0].fltRouteSrc in this example) is not part of any index, you need a separate GROUP operator to do the grouping. This is indicated by the existence of the GROUP iterators in the execution plan. There are two GROUP iterators: one that operates at the driver (above the RECEIVE iterator) and another that operates at the RNs (below the RECEIVE iterator).
  • The lower GROUP iterator has a SELECT iterator as input. The SELECT returns the fltRouteSrc and count of bags. The GROUP iterator will operate until the batch limit is reached. If the batch limit is defined as the max number N of results produced, the GROUP iterator will stop when up to N flight source groups have been created. If the batch limit is defined as the max number of bytes read, it will stop when this max is reached. The GROUP operator has an input variable. For the inner GROUP operator, the input variable is $gb-1 and for the outer GROUP operator it is $gb-2.
    "iterator kind" : "GROUP","input variable" : "$gb-1",
  • The primary key index is used here and in this example, it is not a covering index as the query has fields that are not part of the entries of the primary index.
  • When the FROM iterator is a TABLE iterator, the FROM variable is the same as either the index row variable or the row variable of the TABLE iterator, depending on whether the used index is covering or not. Every time a next() call on the FROM iterator returns true, the variable will be bound to the result produced by that iterator. In this example, the FROM variable is the row variable as the index is not covering.
  • This row variable ($bag) will be referenced by iterators implementing the other clauses of the inner SELECT expression.
  • The GROUP iterator creates an internal variable ($gb-1) that iterates over the records produced by the SELECT expression.
  • The result set produced by the lower GROUP iterator is partial: it may not contain all the fltRouteSrc groups and for the fltRouteSrc groups that it does contain, the count may be a partial sum (because all rows for a given fltRouteSrc may not have been retrieved when query execution stops). The upper GROUP iterator receives the partial results from each RN and performs the final grouping and aggregation. It operates the same way as the lower GROUP iterators and will keep operating until the are no more partial results from the RNs. At that point, the full and final result set is cached at the upper GROUP iterator and is returned to the application.
  • The upper GROUP iterator creates an internal variable ($gb-2) that iterates over the records produced by the outer SELECT. The $gb-2 variable has the fltRouteSrc and count of all bags grouped by fltRouteSrc.
  • In the SELECT expression, two fields are fetched: fltRouteSrc,count(*). These correspond to two field names and field expressions in the SELECT expression clause. For the first field, the field expression uses FIELD_STEP iterator. The second field is the aggregate function count. The iterator FUNC_SUM is used to iterate the result produced by its parent iterator and determine the total number of bags.