Overview of a query plan

Internally, a query execution plan is structured as a tree of plan iterators.

Each kind of iterator evaluates a different kind of expression that may appear in a query. In general the choice of index and the kind of associated index predicates can have a drastic effect on the query performance. As a result, you as a developer often want to see what index is used by a query and what predicates have been pushed down to it. Based on this information, you may want to force the use of a different index via index hints. This information is contained in the query execution plan. All Oracle NoSQL drivers provide APIs to display the execution plan of a query. All Oracle NoSQL graphical UIs including the IntelliJ, VSCode, and Eclipse plugins along with the Oracle Cloud Infrastructure Console include controls for displaying the query execution plan.

Iterators in a query execution plan

Some of the most common and important iterators used in queries are :

TABLE iterator

A TABLE iterator is responsible for:
  • Scanning the index used by the query (which may be the primary index).
  • Applying any filtering predicates pushed to the index.
  • Retrieving the rows pointed to by the qualifying index entries if necessary. If the index is covering, the result set of the TABLE iterator is a set of index entries, otherwise it is a set of table rows.

Note:

An index is called a covering index with respect to a query if the query can be evaluated using only the entries of that index, that is, without the need to retrieve the associated rows.
A TABLE iterator will always have the following properties:
  • target table: The name of the target table in the query.
  • index used: The name of the index used by the query. If the primary index were used, “primary index” would appear as the value of this property.
  • covering index: Whether the index is covering or not.
  • row variable: The name of a variable ranging over the table rows produced by the TABLE iterator. If the index is covering, no table rows are produced and this variable is not used.
  • index scans: Contains the start and stop conditions that define the index scans to be performed.
A TABLE iterator has 2 more optional properties:
  • index row variable: The name of a variable ranging over the index entries produced by the TABLE iterator. Every time a new index entry is produced by the index scan, the index variable will be bound to that entry.
  • index filtering predicate: A predicate evaluated on every index entry produced by the index scan. If the result of this evaluation is true, the index variable is bound to this entry and the entry or its associated table row is returned as the result of the next() call on the TABLE iterator. Otherwise, the entry is skipped, the next entry from the index scan is produced, the predicate is evaluated again on this entry and it continues until a qualifying entry is found.

SELECT iterator

It is responsible for executing the SELECT expression.

RECEIVE iterator

It is a special internal iterator that separates the query plan into 2 parts:
  • The RECEIVE iterator itself and all iterators that are above it in the iterator tree are executed at the driver.
  • All iterators below the RECEIVE iterator are executed at the replication nodes (RNs); these iterators form a subtree rooted at the unique child of the RECEIVE iterator.

In general, the RECEIVE iterator acts as a query coordinator. It sends its subplan to appropriate RNs for execution and collects the results. It may perform additional operations such as sorting and duplicate elimination, and propagates the results to its ancestor iterators (if any) for further processing.

Distribution kinds :

A distribution kind specifies how the query will be distributed for execution across the RNs participating in an Oracle NoSQL database (a store). The distribution kind is a property of the RECEIVE iterator.

Different choices of Distribution kinds are:
  • SINGLE_PARTITION: A SINGLE_PARTITION query specifies a complete shard key in its WHERE clause. As a result, its full result set is contained in a single partition, and the RECEIVE iterator will send its subplan to a single RN that stores that partition. A SINGLE_PARTITION query may use either the primary-key index or a secondary index.
  • ALL_PARTITIONS: Queries use the primary-key index here and they don’t specify a complete shard key. As a result, if the store has M partitions, the RECEIVE iterator will send M copies of its subplan to be executed over one of the M partitions each. See show topology to determine the number of partitions in your store.
  • ALL_SHARDS: Queries use a secondary index here and they don’t specify a complete shard key. As a result, if the store has N shards, the RECEIVE iterator will send N copies of its subplan to be executed over one of the N shards each.

Anatomy of a query execution plan:

Query execution takes place in batches. When a query subplan is sent to a partition or shard for execution, it will execute there until a batch limit is reached. For an on-premises NoSQL database, batch limit is the number of local results produced from the underlying partition/shard. The default is 100 results, but you can change it via a query-level option. For NoSQL Database Cloud Service, the batch limit is the number of read units consumed locally by the query. The default is 2000 read units (about 2MB of data), and it can only be decreased via a query-level option.

When the batch limit is reached, any local results that were produced are sent back to the RECEIVE iterator for further processing along with a boolean flag that says whether more local results may be available. If the flag is true, the reply includes resume information. If the RECEIVE iterator decides to resend the query to the same partition/shard, it will include this resume information in its request, so that the query execution will restart at the point where it stopped during the previous batch. This is because no query state is maintained at the RN after a batch finishes. The next batch for the same partition/shard may take place at the same RN as the previous batch or at a different RN that also stores the same partition/shard.