Query Processing for Multi-Shard Queries
A multi-shard query is a query that must scan data from more than one shard, and the processing on each shard is independent of any other shard.
A multi-shard query maps to more than one shard and the coordinator might need to do some processing before sending the result to the client. For example, the following query gets the number of orders placed by each customer.
SELECT count(*), c.custno FROM customers c, orders o WHERE c.custno = o.custno
GROUP BY c.custno;
The query is transformed to the following by the coordinator.
SELECT sum(count_col), custno FROM (SELECT count(*) count_col, c.custno
FROM customers c, orders o
WHERE c.custno = o.custno GROUP BY c.custno) GROUP BY custno;
The inline query block is mapped to every shard just as a remote mapped query block.
The coordinator performs further aggregation and GROUP BY on top of the
result set from all shards. The unit of processing on every shard is the inline query
block.
Multi-Shard Queries and Global Read Consistency
A multi-shard query must maintain global read consistency (CR) by issuing the query at the highest common SCN across all the shards. See Specifying Consistency Levels in a Multi-Shard Query for information about how to set consistency levels.
Passing Hints in Multi-Shard Queries
Any hint specified in the original query on the coordinator is propagated to the shards.
Tracing and Troubleshooting Slow Running Multi-Shard Queries
Set the trace event shard_sql on the coordinator to trace the
query rewrite and shard pruning. One of the common performance issues observed is when the
GROUP BY is not pushed to the shards because of certain limitations of the sharding. Check
if all of the possible operations are pushed to the shards and the coordinator has minimal
work to consolidate the results from shards.
Specifying Consistency Levels in a Multi-Shard Query
You can use the initialization parameter
MULTISHARD_QUERY_DATA_CONSISTENCY to set different consistency levels
when running multi-shard queries across shards.
MULTISHARD_QUERY_DATA_CONSISTENCY = { STRONG | SHARD_LOCAL |
DELAYED_STANDBY_ALLOWED | LOCAL_STANDBY_ONLY | LOCAL_STANDBY_PREFERRED }As shown in the syntax above, you can specify different consistency levels for multi-shard queries. For example, you might want some queries to avoid the cost of SCN synchronization across shards, and these shards could be globally distributed.
Another use case is when you use standbys for replication and slightly stale data is acceptable for multi-shard queries, as the results could be fetched from the primary and its standbys.
The default mode is STRONG, which performs SCN synchronization
across all shards. Other modes skip SCN synchronization.
SHARD_LOCAL mode does not perform SCN synchronization and
provides the most current data.
The DELAYED_STANDBY_ALLOWED mode also does not perform SCN
synchronization and allows fetching data from the standbys as well, depending on load
balancing and other factors, and could contain stale data.
LOCAL_STANDBY_ONLY and LOCAL_STANDBY_PREFERRED support
region affinity for cross shard queries. Read-only cross-shard queries are executed on
the shards close to the query coordinator. This is decided based on the region of the
coordinator (shard catalog) and shards.
On the shard catalog, a region can be set using GDSCTL command MODIFY
CATALOG with the -region parameter. If the
-db_unique_name parameter is supplied, the region of the standby
catalog with passed DB unique name is modified.
To change region of the shard, GDSCTL command ‘modify shard’ can be used if region is not already set during ‘add shard’.
LOCAL_STANDBY_ONLY mode does not perform SCN synchronization and
supports region affinity for cross-shard queries. It allows data to be fetched from
standby shards in the same region as the coordinator running the query. If there are no
standby shards available, an error is raised.
LOCAL_STANDBY_PREFERRED is similar to the
LOCAL_STANDBY_ONLY setting, except the database attempts to find a
shard from which to fetch data in the following order:
-
Standby shards in the same region; chunks distributed equally among shards
-
Primary shard in the same region
-
Standby shards in the buddy region; chunks distributed equally among shards
-
Primary shard in the buddy region
-
If no shards are available, raise an error.
This parameter can be set either at the system level or at the session level.
See Also:
Oracle AI Database
Reference for more information about
MULTISHARD_QUERY_DATA_CONSISTENCY usage.