7 Using the Sharding Advisor
Sharding Advisor simplifies the migration of your existing, non-sharded Oracle database to a sharded database, by analyzing your workload and database schema, and recommending the most effective Oracle Sharding configurations.
About Sharding Advisor
The Sharding Advisor is a client-side, command-line tool that you run against any non-sharded, production, 10g or later release, Oracle Database that you are considering migrating to an Oracle Sharding environment.
The Sharding Advisor analysis provides you with the information you need to design a schema that maximizes performance while reducing duplicated data in the new sharded database environment.
The following are benefits of using Sharding Advisor to aid you with schema design.
- Maximize query workload performance
- Minimize multi-shard operations requiring cross-shard joins
- Maximize parallelism for complex queries (spread query processing across all shards)
- Minimize the amount of duplicated data on each shard
The Sharding Advisor utility, GWSADV
, is installed with
Oracle Database as a standalone tool, and connects to your database using authenticated
OCI connections.
To get an understanding of your schema and other preferences, Sharding Advisor asks you questions as part of an interactive dialog.
Sharding Advisor then connects to the existing non-sharded database, also called the source, analyzes its schema and query workload, and produces a set of alternative designs for the sharded database, including recommendations for an effective sharding key, which tables to shard, and which tables to duplicate on all shards.
Sharding configurations are ranked in terms of query performance, with the ranking favoring configurations that maximize single and multi-shard queries that do not require cross-shard joins, while minimizing multi-shard queries that require cross-shard joins.
You choose the design that best fits your requirements. The designs are ranked by the advisor, so if you don't have specific preferences you can choose the highest ranked design by default.
Note:
There are restrictions to Sharding Advisor capabilities:The source database must be Oracle Database 10g or later release.
If you cannot run the Sharding Advisor against the live production database, you can run the Sharding Advisor on a different server that has the schema and workload imported from the production database.
Sharding Advisor discovers the table families based on primary
key-foreign key relationships. If the schema does not have any primary key-foreign
key constraints, sharding by PARENT
clause is recommended.
Currently, Sharding Advisor recommends only single-table family,
system-managed sharding (sharding by reference) configurations if the source
database has foreign key constraints; otherwise, Sharding Advisor recommends
sharding using the PARENT
clause.
Run Sharding Advisor
Run the Sharding Advisor command-line tool against your existing, non-sharded Oracle Database to obtain recommended Oracle Sharding configurations.
The user running Sharding Advisor requires the following privileges.
SQL> ALTER SYSTEM SET statistics_level=all;
SQL> grant create session to sharding_advisor_user;
SQL> grant alter session to sharding_advisor_user;
SQL> grant select on v_$sql_plan to sharding_advisor_user;
SQL> grant select on v_$sql_plan_statistics_all to sharding_advisor_user;
SQL> grant select on gv_$sql_plan to sharding_advisor_user;
SQL> grant select on gv_$sql_plan_statistics_all to sharding_advisor_user;
SQL> grant select on DBA_HIST_SQLSTAT to sharding_advisor_user;
SQL> grant select on dba_hist_sql_plan to sharding_advisor_user;
SQL> grant select on dba_hist_snapshot to sharding_advisor_user;
The Sharding Advisor command-line utility, GWSADV
, runs from
$ORACLE_HOME/bin
.
Run the Sharding Advisor from the command line, as shown here.
$ gwsadv -u username -p password -c –w sch=\(schema1,schema2\)
Note:
The parenthesis in this command is escaped on Linux systems.Where -u
and -p
are the user name and
password of the user that runs the Sharding Advisor.
Use the capture workload parameter, -c
, the first time
you run Sharding Advisor against an existing query workload, to capture the
predicate information from the source's GV$SQL_PLAN_STATISTICS_ALL
view. You don't need to use -c
in subsequent queries on the same
workload.
The required -w
flag indicates that Sharding Advisor
uses the query workload for sharding configuration generation and ranking.
In this case, the sch
parameter specifies a list of
schemas to run Sharding Advisor against. There are several other options you can use
with Sharding Advisor, detailed in Sharding Advisor Usage and Options.
Run Sharding Advisor on a Non-Production System
To minimize the impact on a live production system, you can run the Sharding Advisor on a copy of the database schema and workload, located on a different server than the production system.
To get the same results as if it were the live production system, the production database schema and workload can be exported using the Oracle Data Pump utilities and copied to a different server. Then you can run Sharding Advisor on the imported schema.
You only export the database schema and system tables. There is no need to export the actual data.
The following procedure uses the HR schema as an example.
Do the following steps on the source (production) database server.
Do the following steps on the target database server.
Review Sharding Advisor Output
Sharding Advisor discovers the table families for each potential sharding column that it extracts from the query workload, and ranks the table families based on query classification rules and a ranking algorithm.
To review the sharding configurations and related information that is owned by the user running Sharding Advisor, you can query the following output database tables, which are stored in the same schema as your source database.
SHARDINGADVISOR_CONFIGURATIONS
has one row for each table in a ranked sharded configuration, and provides details for each table, such as whether to shard or duplicate it, and if sharded, its level in a table family hierarchy, its parent table, root table sharding key, foreign key reference constraints, and the estimated size per shard.SHARDINGADVISOR_CONFIGDETAILS
has one row for each ranked sharding configuration, and provides details for each ranked sharding configuration, such as the number and collective size, per shard, of the sharded tables, and the number and collective size of the duplicated tables. It also provides the number of single shard and multi-shard queries to expect in production, as well as the number of multi-shard queries requiring cross-shard joins, based on your source database's current workload, and an estimated cost.SHARDINGADVISOR_QUERYTYPES
, for each query in the workload, lists the query type for each sharding configuration. Note that the same query can be of a different query type depending on the sharding configuration.
Because the Sharding Advisor output is contained in regular database tables, you can run many kinds of SQL queries against them to look at the output from different perspectives.
For example, to display the sharding configurations in ranking order, run
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols, unenforceableconstraints,
sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS
ORDER BY rank, tlevel, tname, parent;
For details about the Sharding Advisor output tables and more example queries see Sharding Advisor Output Tables and Sharding Advisor Output Review SQL Examples
Choose a Sharding Advisor Recommended Configuration
There are some aspects of database sharding to take into consideration when deciding which configuration to choose for your sharded database.
Increasing the number of shards will result in higher availability and scalability of the sharded database.
Minimizing duplicated data can conflict with your desire to minimize multi-shard queries that require joins across multiple shards. Because joins in a sharded database are usually performed on related data, storing related data in the same shard can dramatically speed up processing of such joins.
The overall cost, in terms of query workload, of the recommended
sharding configurations is based on the number of each query type (single shard,
multi-shard, and multi-shard with cross-shard joins) in the workload, where
multi-shard queries with cross-shard joins have the highest cost, and single shard
queries have the lowest cost. The cost information is in the COST
column of the Sharding Advisor SHARDINGADVISOR_CONFIGDETAILS
output
table.
Sharding Advisor Usage and Options
Syntax
gwsadv
[-n nodeName[:portnum]]
[-s serviceName]
-u username
-p password
[-c]
[-awr_snap_begin timestamp]
[-awr_snap_end timestamp]
–w
[sch=(schema1, schema2, …)]
[-tab importantTabsFile]
[-pr numpreds:n]
[-t trace_file]
Options
Note that each option must be prefixed with a minus sign (-) except for the
sch
argument.
Option | Description | Required (Y/N) |
---|---|---|
-awr_snap_begin timestamp |
Specify the beginning timestamp, in the format 'YYYY-MM-DD HH24:MI:SS', to specify the AWR snapshots to capture the workload from. |
N |
-awr_snap_end timestamp |
Specify the end timestamp, in the format 'YYYY-MM-DD HH24:MI:SS', to specify the AWR snapshots to capture the workload from. |
N |
-c |
Capture a new or changed workload. Use Required on first run of Sharding Advisor on a new or changed workload. Not required on subsequent runs on the same workload. By default, the workload is captured from the
Alternatively, the workload can be captured from
Automatic Workload Repository (AWR) snapshots by using the
|
N |
-n nodeName[:portnum] |
Node name and port number, if connecting to a database on another host |
N |
-p password |
Oracle password |
Y |
-pr
numpreds:n |
Limits the number of predicates to be captured when using
|
N |
-s serviceName |
Service name, if connecting to a database on another host |
N |
sch |
The |
N |
-t
trace_file |
Enables tracing of all activities performed by sharding advisor. Specify an output file name. |
N |
-tab importantTabsFile |
Name of file that consists of table names, one per line, in the format schemaname.tablename, to restrict the number of tables that the Sharding Advisor needs to analyze. |
N |
-u
username |
Oracle user name |
Y |
-w |
Directs Sharding Advisor to use the query workload for sharding configuration generation and ranking. |
Y |
Usage Notes
The normal usage of the sharding advisor is to not specify the –pr
option. The query workload capture should be faster now even without the
–pr
option. If however, the you want to speed it up further,
the –pr
option can be used. If it is used, it has to be used in
conjunction with the –c
option. If unspecified, the number of
predicates to be captured is not limited.
For procedures describing how to run the Sharding Advisor with example commands see Run Sharding Advisor and Run Sharding Advisor on a Non-Production System.
Sharding Advisor Output Tables
To review the sharding configurations and related information, you can query the following output database tables, which are stored in the same schema as your source database.
SHARDINGADVISOR_CONFIGURATIONS Table
SHARDINGADVISOR_CONFIGURATIONS Schema
Column | Description |
---|---|
RANK |
The rank of the sharding configuration based on the ranking algorithm |
TABLENAME |
Name of the table in the sharding configuration |
TABLETYPE |
‘S’ (Sharded), ‘D’ (Duplicated), or ‘L’ (Local) |
TABLELEVEL |
Level of the table in the table family hierarchy, NULL for duplicated tables |
PARENT |
Parent of the table in the table family hierarchy, NULL for duplicated tables |
SHARDBY |
Sharding method. REFERENCE for
sharding by reference, or PARENT for sharding by
PARENT clause, for child tables.
|
SHARDINGORREFERENCECOLS |
Sharding key for the root table, partition by
REFERENCE or PARENT for the
child tables in a table family, and NULL for duplicated
tables
|
UNENFORCEABLECONSTRAINTS |
Foreign key constraints other than the reference columns, which cannot be enforced |
SIZEOFTABLE |
Size of the table per shard |
SHARDINGADVISOR_CONFIGDETAILS Table
SHARDINGADVISOR_CONFIGDETAILS Schema
Column | Description |
---|---|
RANK |
The rank of the sharding configuration based on the ranking algorithm |
CHOSENBYUSER |
‘Y’ if the sharding configuration is chosen by the user, NULL for other sharding configurations |
NUMSHARDEDTABLES |
Number of sharded tables in this sharding configuration |
SIZEOFSHARDEDTABLES |
Cumulative size of sharded tables (per shard) in this sharding configuration |
NUMDUPLICATEDTABLES |
Number of duplicated tables in this sharding configuration |
SIZEOFDUPLICATEDTABLES |
Cumulative size of duplicated tables (per shard) in this sharding configuration |
NUMSINGLESHARDQUERIES |
Number of single shard queries in the query workload for this sharding configuration |
NUMMULTISHARDQUERIES |
Number of multi-shard queries in the query workload for this sharding configuration |
NUMCROSSSHARDQUERIES |
Number of multi-shard queries that require an external join in the query workload for this sharding configuration |
COST |
Cost of the sharding configuration based on the costing algorithm |
SHARDINGADVISOR_QUERYTYPES Table
SHARDINGADVISOR_QUERYTYPES Schema
Column | Description |
---|---|
RANK |
The rank of the sharding configuration based on the ranking algorithm |
SQLID |
The query SQL ID |
QUERYTYPE |
The type of the query in this sharding
configuration: SINGLE SHARD QUERY , MULTI
SHARD QUERY , or CROSS SHARD
QUERY |
Sharding Advisor Output Review SQL Examples
Because the Sharding Advisor output is contained in regular database tables, you can run many kinds of SQL queries against them to look at the output from different perspectives.
Example 7-1 Display the sharding configurations in ranking order
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols, unenforceableconstraints,
sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS
ORDER BY rank, tlevel, tname, parent;
Example 7-2 Display the table family of the top ranked sharding configuration
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols, unenforceableconstraints,
sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE rank = 1 AND tabletype = 'S'
ORDER BY tlevel, tname, parent;
Example 7-3 Display the table families in ranking order
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols, unenforceableconstraints,
sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE tabletype = 'S'
ORDER BY rank, tlevel, tname, parent;
Example 7-4 Display the duplicated tables of the top ranked sharding configuration
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols, unenforceableconstraints,
sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE rank = 1 AND tabletype = 'D'
ORDER BY tlevel, tname, parent;
Example 7-5 Display the number of sharding configurations with table_name as the root table
SELECT COUNT(*)
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE tablename = 'TABLE_NAME' AND tablelevel = 0;
Example 7-6 Display the table families of the sharding configurations with root table table_name
SELECT rank, tableName as tname, tabletype as type,
tablelevel as tlevel, parent, shardby as shardBy,
shardingorreferencecols as cols
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE tabletype = 'S'
AND rank IN
(SELECT rank
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE tablename = 'TABLE_NAME' and tablelevel = 0)
ORDER BY rank, tlevel, tname, parent;
Example 7-7 Display the details of the sharding configurations in ranking order
SELECT rank, chosenbyuser,
numshardedtables as stabs, sizeofshardedtables as sizestabs,
numduplicatedtables as dtabs,
sizeofduplicatedtables as sizedtabs,
numsingleshardqueries as numssq,
nummultishardqueries as nummsq,
numcrossshardqueries as numcsq, cost
FROM SHARDINGADVISOR_CONFIGDETAILS
ORDER BY rank;
Example 7-8 Display the details of your chosen sharding configuration
SELECT rank,
numshardedtables as stabs, sizeofshardedtables as sizestabs,
numduplicatedtables as dtabs,
sizeofduplicatedtables as sizedtabs,
numsingleshardqueries as numssq,
nummultishardqueries as nummsq,
numcrossshardqueries as numcsq, cost
FROM SHARDINGADVISOR_CONFIGDETAILS
WHERE CHOSENBYUSER = ‘Y’
ORDER BY RANK;
Sharding Advisor Security
Sharding Advisor is a client-side utility that connects to the non-sharded database using authenticated OCI connections.
- The Sharding Advisor requires the appropriate credentials (user name and
password) to connect to the non-sharded source database. Sharding Advisor can be run
as a different user than the user that owns the source database schema that the
Sharding Advisor analyzes. This user must have
SELECT
privileges on the tables in the non-sharded schema. - The user needs
SELECT
privileges on theGV$SQL_PLAN
andGV$SQL_PLAN_STATISTICS_ALL
views, and on theDBA_HIST_SQL_PLAN
,DBA_HIST_SQLSTAT
, andDBA_HIST_SNAPHSOT
tables. The user does not need any other special privileges. - Sharding Advisor is not vulnerable to privilege escalation and denial of service.
- Sharding Advisor does not store or expose any sensitive data such as passwords, database service names, or user names.
- Sharding Advisor does not expose sensitive details about the inner workings of the product.
- Sharding Advisor does not include any interfaces or APIs which are not externally documented.
- Sharding Advisor does not require any insecure protocols to be enabled.
- Sharding Advisor does not use any insecure modes of operation.
- Sharding Advisor does not store any data or other information in any files.
- All connections to the database are through authenticated OCI connections.
- There are no SETUID executables created.
- No new grants to PUBLIC are done.
- No new default schemas are created, but Sharding Advisor internal tables are created under the user that is used to run Sharding Advisor.