7 Using the Sharding Advisor

Sharding Advisor simplifies the migration of your existing, non-sharded Oracle database to a sharded database, by analyzing your workload and database schema, and recommending the most effective Oracle Sharding configurations.

About Sharding Advisor

The Sharding Advisor is a client-side, command-line tool that you run against any non-sharded, production, 10g or later release, Oracle Database that you are considering migrating to an Oracle Sharding environment.

The Sharding Advisor analysis provides you with the information you need to design a schema that maximizes performance while reducing duplicated data in the new sharded database environment.

The following are benefits of using Sharding Advisor to aid you with schema design.

  • Maximize query workload performance
  • Minimize multi-shard operations requiring cross-shard joins
  • Maximize parallelism for complex queries (spread query processing across all shards)
  • Minimize the amount of duplicated data on each shard

The Sharding Advisor utility, GWSADV, is installed with Oracle Database as a standalone tool, and connects to your database using authenticated OCI connections.

To get an understanding of your schema and other preferences, Sharding Advisor asks you questions as part of an interactive dialog.

Sharding Advisor then connects to the existing non-sharded database, also called the source, analyzes its schema and query workload, and produces a set of alternative designs for the sharded database, including recommendations for an effective sharding key, which tables to shard, and which tables to duplicate on all shards.

Sharding configurations are ranked in terms of query performance, with the ranking favoring configurations that maximize single and multi-shard queries that do not require cross-shard joins, while minimizing multi-shard queries that require cross-shard joins.

You choose the design that best fits your requirements. The designs are ranked by the advisor, so if you don't have specific preferences you can choose the highest ranked design by default.

Note:

There are restrictions to Sharding Advisor capabilities:

The source database must be Oracle Database 10g or later release.

If you cannot run the Sharding Advisor against the live production database, you can run the Sharding Advisor on a different server that has the schema and workload imported from the production database.

Sharding Advisor discovers the table families based on primary key-foreign key relationships. If the schema does not have any primary key-foreign key constraints, sharding by PARENT clause is recommended.

Currently, Sharding Advisor recommends only single-table family, system-managed sharding (sharding by reference) configurations if the source database has foreign key constraints; otherwise, Sharding Advisor recommends sharding using the PARENT clause.

Run Sharding Advisor

Run the Sharding Advisor command-line tool against your existing, non-sharded Oracle Database to obtain recommended Oracle Sharding configurations.

The user running Sharding Advisor requires the following privileges.

SQL> ALTER SYSTEM SET statistics_level=all;
SQL> grant create session to sharding_advisor_user;
SQL> grant alter session to sharding_advisor_user;
SQL> grant select on v_$sql_plan to sharding_advisor_user;
SQL> grant select on v_$sql_plan_statistics_all to sharding_advisor_user;
SQL> grant select on gv_$sql_plan to sharding_advisor_user;
SQL> grant select on gv_$sql_plan_statistics_all to sharding_advisor_user;
SQL> grant select on DBA_HIST_SQLSTAT to sharding_advisor_user;
SQL> grant select on dba_hist_sql_plan to sharding_advisor_user;
SQL> grant select on dba_hist_snapshot to sharding_advisor_user;

The Sharding Advisor command-line utility, GWSADV, runs from $ORACLE_HOME/bin.

Run the Sharding Advisor from the command line, as shown here.

$ gwsadv -u username -p password -c –w sch=\(schema1,schema2\)

Note:

The parenthesis in this command is escaped on Linux systems.

Where -u and -p are the user name and password of the user that runs the Sharding Advisor.

Use the capture workload parameter, -c, the first time you run Sharding Advisor against an existing query workload, to capture the predicate information from the source's GV$SQL_PLAN_STATISTICS_ALL view. You don't need to use -c in subsequent queries on the same workload.

The required -w flag indicates that Sharding Advisor uses the query workload for sharding configuration generation and ranking.

In this case, the sch parameter specifies a list of schemas to run Sharding Advisor against. There are several other options you can use with Sharding Advisor, detailed in Sharding Advisor Usage and Options.

Run Sharding Advisor on a Non-Production System

To minimize the impact on a live production system, you can run the Sharding Advisor on a copy of the database schema and workload, located on a different server than the production system.

To get the same results as if it were the live production system, the production database schema and workload can be exported using the Oracle Data Pump utilities and copied to a different server. Then you can run Sharding Advisor on the imported schema.

You only export the database schema and system tables. There is no need to export the actual data.

The following procedure uses the HR schema as an example.

Do the following steps on the source (production) database server.

  1. Export the schema using Data Pump Export.
    > expdp system/password SCHEMAS=HR DIRECTORY=HR_DIR CONTENT=METADATA_ONLY
     DUMPFILE=hr_metadata.dmp LOGFILE=hr_exp.lst
  2. Export the Automatic Work Repository (AWR) snapshot.
    SQL> @$ORACLE_HOME/rdbms/admin/awrextr.sql

Do the following steps on the target database server.

  1. Copy the dump files from the source to the target.

    For example, copy the dump files to /scratch/dump.

  2. Create a user that can run Sharding Advisor on the schema.
    SQL> CREATE USER hr IDENTIFIED BY password;
  3. Create (or replace) the dump file directory variable that Data Pump Import can reference.
    SQL> CREATE DIRECTORY HR_DIR AS '/scratch/dump'
    
    SQL> CREATE OR REPLACE DIRECTORY  HR_DIR AS '/scratch/dump'
  4. Import the schema.
    > impdp system/password DIRECTORY=HR_DIR DUMPFILE=hr.dmp LOGFILE=imp.lst SCHEMAS=HR
  5. Load the AWR data.
    SQL> @$ORACLE_HOME/rdbms/admin/awrload.sql
  6. Now you can run Sharding Advisor on the target, non-production, copy of the database with the user you created.
    > gwsadv –u hr –p password –c -awr_snap_begin begin_timestamp –awr_snap_end end_timestamp -w

Review Sharding Advisor Output

Sharding Advisor discovers the table families for each potential sharding column that it extracts from the query workload, and ranks the table families based on query classification rules and a ranking algorithm.

To review the sharding configurations and related information that is owned by the user running Sharding Advisor, you can query the following output database tables, which are stored in the same schema as your source database.

  • SHARDINGADVISOR_CONFIGURATIONS has one row for each table in a ranked sharded configuration, and provides details for each table, such as whether to shard or duplicate it, and if sharded, its level in a table family hierarchy, its parent table, root table sharding key, foreign key reference constraints, and the estimated size per shard.
  • SHARDINGADVISOR_CONFIGDETAILS has one row for each ranked sharding configuration, and provides details for each ranked sharding configuration, such as the number and collective size, per shard, of the sharded tables, and the number and collective size of the duplicated tables. It also provides the number of single shard and multi-shard queries to expect in production, as well as the number of multi-shard queries requiring cross-shard joins, based on your source database's current workload, and an estimated cost.
  • SHARDINGADVISOR_QUERYTYPES, for each query in the workload, lists the query type for each sharding configuration. Note that the same query can be of a different query type depending on the sharding configuration.

Because the Sharding Advisor output is contained in regular database tables, you can run many kinds of SQL queries against them to look at the output from different perspectives.

For example, to display the sharding configurations in ranking order, run

SELECT rank, tableName as tname, tabletype as type,
           tablelevel as tlevel, parent, shardby as shardBy,
           shardingorreferencecols as cols, unenforceableconstraints,
           sizeoftable  
FROM SHARDINGADVISOR_CONFIGURATIONS
ORDER BY rank, tlevel, tname, parent;

For details about the Sharding Advisor output tables and more example queries see Sharding Advisor Output Tables and Sharding Advisor Output Review SQL Examples

Choose a Sharding Advisor Recommended Configuration

There are some aspects of database sharding to take into consideration when deciding which configuration to choose for your sharded database.

Increasing the number of shards will result in higher availability and scalability of the sharded database.

Minimizing duplicated data can conflict with your desire to minimize multi-shard queries that require joins across multiple shards. Because joins in a sharded database are usually performed on related data, storing related data in the same shard can dramatically speed up processing of such joins.

The overall cost, in terms of query workload, of the recommended sharding configurations is based on the number of each query type (single shard, multi-shard, and multi-shard with cross-shard joins) in the workload, where multi-shard queries with cross-shard joins have the highest cost, and single shard queries have the lowest cost. The cost information is in the COST column of the Sharding Advisor SHARDINGADVISOR_CONFIGDETAILS output table.

Sharding Advisor Usage and Options

Sharding Advisor is a client command-line tool that connects to an existing non-sharded database and provides sharding configuration recommendations.

Syntax

gwsadv
 [-n nodeName[:portnum]]
 [-s serviceName]
  -u username
  -p password
 [-c]
 [-awr_snap_begin timestamp] 
 [-awr_snap_end timestamp]
  –w
 [sch=(schema1, schema2, …)]
 [-tab importantTabsFile]
 [-pr numpreds:n]
 [-t trace_file]

Options

Note that each option must be prefixed with a minus sign (-) except for the sch argument.

Option Description Required (Y/N)
-awr_snap_begin timestamp

Specify the beginning timestamp, in the format 'YYYY-MM-DD HH24:MI:SS', to specify the AWR snapshots to capture the workload from.

N
-awr_snap_end timestamp

Specify the end timestamp, in the format 'YYYY-MM-DD HH24:MI:SS', to specify the AWR snapshots to capture the workload from.

N
-c

Capture a new or changed workload.

Use -pr to limitthe number of predicates to be captured

Required on first run of Sharding Advisor on a new or changed workload.

Not required on subsequent runs on the same workload.

By default, the workload is captured from the V$SQL_PLAN_STATISTICS_ALL table.

Alternatively, the workload can be captured from Automatic Workload Repository (AWR) snapshots by using the -awr_snap_begin and -awr_snap_end options with the -c option to specify the beginning and ending time stamps of the AWR snapshots.

N
-n nodeName[:portnum]

Node name and port number, if connecting to a database on another host

N
-p password

Oracle password

Y
-pr numpreds:n

Limits the number of predicates to be captured when using -c to capture a new or changed workload.

N
-s serviceName

Service name, if connecting to a database on another host

N
sch

The sch option specifies the list of schemas to run Sharding Advisor against, if you want to run as a different user.

N
-t trace_file

Enables tracing of all activities performed by sharding advisor. Specify an output file name.

N
-tab importantTabsFile

Name of file that consists of table names, one per line, in the format schemaname.tablename, to restrict the number of tables that the Sharding Advisor needs to analyze.

N
-u username

Oracle user name

Y
-w

Directs Sharding Advisor to use the query workload for sharding configuration generation and ranking.

Y

Usage Notes

The normal usage of the sharding advisor is to not specify the –pr option. The query workload capture should be faster now even without the –pr option. If however, the you want to speed it up further, the –pr option can be used. If it is used, it has to be used in conjunction with the –c option. If unspecified, the number of predicates to be captured is not limited.

For procedures describing how to run the Sharding Advisor with example commands see Run Sharding Advisor and Run Sharding Advisor on a Non-Production System.

Sharding Advisor Output Tables

To review the sharding configurations and related information, you can query the following output database tables, which are stored in the same schema as your source database.

SHARDINGADVISOR_CONFIGURATIONS Table

Each row of the SHARDINGADVISOR_CONFIGURATIONS table represents a table in a ranked sharded configuration, and provides information about whether to shard or duplicate it, and if sharded, its level in a table family hierarchy, its parent table, root table sharding key, foreign key reference constraints, and table size per shard.

SHARDINGADVISOR_CONFIGURATIONS Schema

Column Description
RANK The rank of the sharding configuration based on the ranking algorithm
TABLENAME Name of the table in the sharding configuration
TABLETYPE ‘S’ (Sharded), ‘D’ (Duplicated), or ‘L’ (Local)
TABLELEVEL Level of the table in the table family hierarchy, NULL for duplicated tables
PARENT Parent of the table in the table family hierarchy, NULL for duplicated tables
SHARDBY Sharding method. REFERENCE for sharding by reference, or PARENT for sharding by PARENT clause, for child tables.
SHARDINGORREFERENCECOLS Sharding key for the root table, partition by REFERENCE or PARENT for the child tables in a table family, and NULL for duplicated tables
UNENFORCEABLECONSTRAINTS Foreign key constraints other than the reference columns, which cannot be enforced
SIZEOFTABLE Size of the table per shard

SHARDINGADVISOR_CONFIGDETAILS Table

Each row of the SHARDINGADVISOR_CONFIGDETAILS table represents a ranked sharding configuration, and provides the number and collective size, per shard, of each type of table, the number of each type of query, and based on your source database's current workload, an estimated cost.

SHARDINGADVISOR_CONFIGDETAILS Schema

Column Description
RANK The rank of the sharding configuration based on the ranking algorithm
CHOSENBYUSER ‘Y’ if the sharding configuration is chosen by the user, NULL for other sharding configurations
NUMSHARDEDTABLES Number of sharded tables in this sharding configuration
SIZEOFSHARDEDTABLES Cumulative size of sharded tables (per shard) in this sharding configuration
NUMDUPLICATEDTABLES Number of duplicated tables in this sharding configuration
SIZEOFDUPLICATEDTABLES Cumulative size of duplicated tables (per shard) in this sharding configuration
NUMSINGLESHARDQUERIES Number of single shard queries in the query workload for this sharding configuration
NUMMULTISHARDQUERIES Number of multi-shard queries in the query workload for this sharding configuration
NUMCROSSSHARDQUERIES Number of multi-shard queries that require an external join in the query workload for this sharding configuration
COST Cost of the sharding configuration based on the costing algorithm

SHARDINGADVISOR_QUERYTYPES Table

Each row of the SHARDINGADVISOR_QUERYTYPES table represents a query in the workload, and lists the query type and SQL ID. Note that the same query can be of a different query type depending on the sharding configuration.

SHARDINGADVISOR_QUERYTYPES Schema

Column Description
RANK The rank of the sharding configuration based on the ranking algorithm
SQLID The query SQL ID
QUERYTYPE The type of the query in this sharding configuration: SINGLE SHARD QUERY, MULTI SHARD QUERY, or CROSS SHARD QUERY

Sharding Advisor Output Review SQL Examples

Because the Sharding Advisor output is contained in regular database tables, you can run many kinds of SQL queries against them to look at the output from different perspectives.

Example 7-1 Display the sharding configurations in ranking order

SELECT rank, tableName as tname, tabletype as type,
           tablelevel as tlevel, parent, shardby as shardBy,
           shardingorreferencecols as cols, unenforceableconstraints,
           sizeoftable  
FROM SHARDINGADVISOR_CONFIGURATIONS
ORDER BY rank, tlevel, tname, parent;

Example 7-2 Display the table family of the top ranked sharding configuration

SELECT rank, tableName as tname, tabletype as type,
        tablelevel as tlevel, parent, shardby as shardBy,
        shardingorreferencecols as cols, unenforceableconstraints,
        sizeoftable
FROM SHARDINGADVISOR_CONFIGURATIONS 
WHERE rank = 1 AND tabletype = 'S' 
ORDER BY tlevel, tname, parent;

Example 7-3 Display the table families in ranking order

SELECT rank, tableName as tname, tabletype as type,
        tablelevel as tlevel, parent, shardby as shardBy,
        shardingorreferencecols as cols, unenforceableconstraints,
        sizeoftable 
FROM SHARDINGADVISOR_CONFIGURATIONS
WHERE tabletype = 'S'
ORDER BY rank, tlevel, tname, parent;

Example 7-4 Display the duplicated tables of the top ranked sharding configuration

SELECT rank, tableName as tname, tabletype as type,
           tablelevel as tlevel, parent, shardby as shardBy,
           shardingorreferencecols as cols, unenforceableconstraints,
           sizeoftable   
FROM SHARDINGADVISOR_CONFIGURATIONS   
WHERE rank = 1 AND tabletype = 'D'   
ORDER BY tlevel, tname, parent;

Example 7-5 Display the number of sharding configurations with table_name as the root table

SELECT COUNT(*)
FROM SHARDINGADVISOR_CONFIGURATIONS 
WHERE tablename = 'TABLE_NAME' AND tablelevel = 0;

Example 7-6 Display the table families of the sharding configurations with root table table_name

SELECT rank, tableName as tname, tabletype as type,
        tablelevel as tlevel, parent, shardby as shardBy, 
        shardingorreferencecols as cols 
FROM SHARDINGADVISOR_CONFIGURATIONS 
WHERE tabletype = 'S'
    AND rank IN
        (SELECT rank 
        FROM SHARDINGADVISOR_CONFIGURATIONS
        WHERE tablename = 'TABLE_NAME' and tablelevel = 0)
ORDER BY rank, tlevel, tname, parent;

Example 7-7 Display the details of the sharding configurations in ranking order

SELECT rank, chosenbyuser,
        numshardedtables as stabs, sizeofshardedtables as sizestabs,
        numduplicatedtables as dtabs,
        sizeofduplicatedtables as sizedtabs,
        numsingleshardqueries as numssq,
        nummultishardqueries as nummsq,
        numcrossshardqueries as numcsq, cost
FROM SHARDINGADVISOR_CONFIGDETAILS
ORDER BY rank;

Example 7-8 Display the details of your chosen sharding configuration

SELECT rank,
        numshardedtables as stabs, sizeofshardedtables as sizestabs,
        numduplicatedtables as dtabs,
        sizeofduplicatedtables as sizedtabs,
        numsingleshardqueries as numssq,
        nummultishardqueries as nummsq,
        numcrossshardqueries as numcsq, cost
FROM SHARDINGADVISOR_CONFIGDETAILS
WHERE CHOSENBYUSER = ‘Y’
ORDER BY RANK;

Sharding Advisor Security

Sharding Advisor is a client-side utility that connects to the non-sharded database using authenticated OCI connections.

  • The Sharding Advisor requires the appropriate credentials (user name and password) to connect to the non-sharded source database. Sharding Advisor can be run as a different user than the user that owns the source database schema that the Sharding Advisor analyzes. This user must have SELECT privileges on the tables in the non-sharded schema.
  • The user needs SELECT privileges on the GV$SQL_PLAN and GV$SQL_PLAN_STATISTICS_ALL views, and on the DBA_HIST_SQL_PLAN, DBA_HIST_SQLSTAT, and DBA_HIST_SNAPHSOT tables. The user does not need any other special privileges.
  • Sharding Advisor is not vulnerable to privilege escalation and denial of service.
  • Sharding Advisor does not store or expose any sensitive data such as passwords, database service names, or user names.
  • Sharding Advisor does not expose sensitive details about the inner workings of the product.
  • Sharding Advisor does not include any interfaces or APIs which are not externally documented.
  • Sharding Advisor does not require any insecure protocols to be enabled.
  • Sharding Advisor does not use any insecure modes of operation.
  • Sharding Advisor does not store any data or other information in any files.
  • All connections to the database are through authenticated OCI connections.
  • There are no SETUID executables created.
  • No new grants to PUBLIC are done.
  • No new default schemas are created, but Sharding Advisor internal tables are created under the user that is used to run Sharding Advisor.