Query Coordinator Operation

The SQL compiler in the shard catalog identifies the relevant shards automatically, and coordinates the query processing across all of the participating shards. Database links are used for the communication between the coordinator and the shards.

As shown in the following figure, at a high level, the coordinator rewrites each incoming query, Q, into two queries, Coordinator Query (CQ) and Shard Query (SQ) where SQ, where SQ (Shard Query) is the part of Q that runs on each participating shard, and CQ (Coordinator Query) is the part of Q that runs on the coordinator shard.

A query, Q, is rewritten into CQ ( Shard_Iterator( SQ ) ), where the Shard_Iterator is the operator that connects to the shards and runs SQ. It can be run in parallel or serially.

Figure 9-3 Query Coordinator Operation



The following is an example of an aggregate query, Q1, rewritten into Q1’.

Q1 : SELECT COUNT(*) FROM customers

Q1’: SELECT SUM(sc) FROM (Shard_Iterator(SELECT COUNT(*) sc FROM s1 (i) ))

There are two main elements in this process.

  1. The relevant shards are identified.

  2. The query is rewritten into a distributive form and iterated across the relevant shards.

During the query compilation on the coordinator database, the query compiler analyzes the predicates on the sharding key, and extracts the predicates that can be used to identify the participating shards, that is, the shards that will contribute rows for the sharded tables referenced in the query. The rest of the shards are referred to as pruned shards.

In the case where only one participating shard was identified, the full query is routed to that shard for processing. This is called a single-shard query.

If there is more than one participating shard, the query is called a multi-shard query and it is rewritten. The rewriting process takes into account the expressions computed by the query as well as the query shape.