To obtain the benefits of sharding, the schema of a sharded database should be designed in a way that maximizes the number of database requests processed on a single shard.
Partitions, Tablespaces, and Chunks
Distribution of partitions across shards is achieved by creating partitions in tablespaces that reside on different shards.
Each partition of a sharded table is stored in a separate tablespace, making the tablespace the unit of data distribution in an SDB.
As described in Sharded Table Family, to minimize the number of multi-shard joins, corresponding partitions of all tables in a table family are always stored in the same shard. This is guaranteed when tables in a table family are created in the same set of distributed tablespaces as shown in the syntax examples where tablespace set
ts1 is used for all tables.
However, it is possible to create different tables from a table family in different
tablespace sets, for example the Customers table in tablespace set
and Orders in tablespace set
ts2. In this case, it must be guaranteed
that the tablespace that stores partition 1 of Customers always resides in the same
shard as the tablespace that stores partition 1 of Orders.
To support this functionality, a set of corresponding partitions from all of the tables in a table family, called a chunk, is formed. A chunk contains a single partition from each table of a table family. This guarantees that related data from different sharded tables can be moved together. In other words, a chunk is the unit of data migration between shards. In system-managed and composite sharding, the number of chunks within each shard is specified when the sharded database is created. In user-defined sharding, the total number of chunks is equal to the number of partitions.
A chunk that contains corresponding partitions from the tables of Cutomers-Orders-LineItems schema is shown in the following figure.
Figure 2-2 Chunk as a Set of Partitions
Description of "Figure 2-2 Chunk as a Set of Partitions"
Each shard contains multiple chunks as shown in the following figure.
Figure 2-3 Contents of a Shard
Description of "Figure 2-3 Contents of a Shard"
In addition to sharded tables, a shard can also contain one or more duplicated tables. Duplicated tables cannot be stored in tablespaces that are used for sharded tables.
Oracle Sharding creates and manages tablespaces as a unit called a
System-managed and composite sharding use
TABLESPACE SET, while
user-defined sharding uses regular tablespaces.
A tablespace is a logical unit of data distribution in a sharded database. The distribution of partitions across shards is achieved by automatically creating partitions in tablespaces that reside on different shards. To minimize the number of multi-shard joins, the corresponding partitions of related tables are always stored in the same shard. Each partition of a sharded table is stored in a separate tablespace.
PARTITIONS AUTO clause specifies that the number of partitions
should be automatically determined. This type of hashing provides more flexibility and
efficiency in migrating data between shards, which is important for elastic
The number of tablespaces created per tablespace set is determined based on the number of chunks that were defined for the shardspace during deployment.
Only Oracle Managed Files are supported by tablespace sets.
Individual tablespaces cannot be dropped or altered independently of the entire tablespace set.
TABLESPACE SET cannot be used with the user-defined
A database table is split up across the shards, so that each shard contains the table with the same columns, but a different subset of rows. A table split up in this manner is called a sharded table.
The following figure shows how a set of large tables (referred to as a table family), shown in the one database on the left, can be horizontally partitioned across the three shards shown on the right, so that each shard contains a subset of the data, indicated with red, yellow, and blue rows.
Figure 2-4 Horizontal Partitioning of a Table Across Shards
Partitions are distributed across shards at the tablespace level, based on a sharding key. Examples of keys include customer ID, account number, and country ID. The following data types are supported for the sharding key.
Each partition of a sharded table resides in a separate tablespace, and each tablespace is associated with a specific shard. Depending on the sharding method, the association can be established automatically or defined by the administrator.
Even though the partitions of a sharded table reside in multiple shards, to the application, the table looks and behaves exactly the same as a partitioned table in a single database. SQL statements issued by an application never have to refer to shards or depend on the number of shards and their configuration.
The familiar SQL syntax for table partitioning specifies how rows should be
partitioned across shards. For example, the following SQL statement creates a sharded
table, horizontally partitioning the table across shards based on the sharding key
CREATE SHARDED TABLE customers ( cust_id NUMBER NOT NULL , name VARCHAR2(50) , address VARCHAR2(250) , region VARCHAR2(20) , class VARCHAR2(3) , signup DATE ,CONSTRAINT cust_pk PRIMARY KEY(cust_id) ) PARTITION BY CONSISTENT HASH (cust_id) PARTITIONS AUTO TABLESPACE SET ts1 ;
The sharded table is partitioned by consistent hash, a special type of hash partitioning commonly used in scalable distributed systems. This technique automatically spreads tablespaces across shards to provide an even distribution of data and workload.
Note:Global indexes on sharded tables are not supported, but local indexes are supported.
Sharded Table Family
A sharded table family is a set of tables that are sharded in the same way. Often there is a parent-child relationship between database tables with a referential constraint in a child table (foreign key) referring to the primary key of the parent table.
Multiple tables linked by such relationships typically form a tree-like structure where every child has a single parent. A set of such tables is referred to as a table family. A table in a table family that has no parent is called the root table. There can be only one root table in a table family.
How a Table Family Is Sharded
Sharding a table family is illustrated here with the Customers–Orders–LineItems schema.
Before sharding, the tables in the schema may look as shown in the examples below. The three tables have a parent-child relationship, with Customers as the root table.
Customers Table (Root) Before Sharding
CustNo Name Address Location Class --------- ---------- -------------- --------- ------ 123 Brown 100 Main St us3 Gold 456 Jones 300 Pine Ave us1 Silver 999 Smith 453 Cherry St us2 Bronze
Orders Table Before Sharding
OrderNo CustNo OrderDate --------- -------- ----------- 4001 123 14-FEB-2013 4002 456 09-MAR-2013 4003 456 05-APR-2013 4004 123 27-MAY-2013 4005 999 01-SEP-2013
LineItems Table Before Sharding
LineNo OrderNo CustNo StockNo Quantity ------ ------- ------ ------- -------- 40011 4001 123 05683022 1 40012 4001 123 45423609 4 40013 4001 123 68584904 1 40021 4002 456 05683022 1 40022 4002 456 45423509 3 40022 4003 456 80345330 16 40041 4004 123 45423509 1 40042 4004 123 68584904 2 40051 4005 999 80345330 12
The tables can be sharded by the customer number,
in the Customers table, which is the root. The shard containing data pertaining to
customer 123 is shown in the following example tables.
Customers Table Shard With Customer 123 Data
CustNo Name Address Location Class --------- ---------- -------------- ---------- ------ 123 Brown 100 Main St us3 Gold
Orders Table Shard With Customer 123 Data
OrderNo CustNo OrderDate --------- -------- ----------- 4001 123 14-FEB-2013 4004 123 27-MAY-2013
LineItems Table Shard With Customer 123 Data
LineNo OrderNo CustNo StockNo Quantity ------ ------- ------ ------- -------- 40011 4001 123 05683022 1 40012 4001 123 45423609 4 40013 4001 123 68584904 1 40041 4004 123 45423509 1 40042 4004 123 68584904 2
Designing Schemas With Multiple Table Families
A sharded database schema can have multiple table families, where all of the data from different table families reside in the same chunks, which contain partitions from different table families sharing the same hash key range.
Note:Multiple table families are supported in system-managed sharded databases only. Composite and user-defined sharded databases only support one table family.
To create a new table family, create a root sharded table and specify tablespace sets that are not used by existing tablespace families. Each table family is identified by its root table. Tables in the different table families should not be related to each other.
Each table family should have its own sharding key definition, while the same restriction on having the same sharding key columns in child tables still holds true within each table family. This means that all tables from different table families are sharded the same way with consistent hash into the same number of chunks, with each chunk containing data from all the table families.
Design your table families such that queries between different table-families are minimal and only carried out on the sharding coordinator, as many such joins will have an effect on performance
The following example shows you how to create multiple table families using
PARENT clause with a system-managed sharding methodology
PARTITION BY CONSISTENT HASH).
CREATE SHARDED TABLE Customers <=== Table Family #1 ( CustId NUMBER NOT NULL , Name VARCHAR2(50) , Address VARCHAR2(250) , region VARCHAR2(20) , class VARCHAR2(3) , signup DATE ) PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO TABLESPACE SET ts1 ; CREATE SHARDED TABLE Orders ( OrderNo NUMBER , CustId NUMBER , OrderDate DATE ) PARENT Customers PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO TABLESPACE SET ts1 ; CREATE SHARDED TABLE LineItems ( LineNo NUMBER , OrderNo NUMBER , CustId NUMBER , StockNo NUMBER , Quantity NUMBER ) ) PARENT Customers PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO TABLESPACE SET ts1 ; CREATE SHARDED TABLE Products <=== Table Family #2 ( ProdId NUMBER NOT NULL, CONSTRAINT pk_products PRIMARY KEY (ProdId) ) PARTITION BY CONSISTENT HASH (ProdId) PARTITIONS AUTO TABLESPACE SET ts_2 ;
ORA-3850 is thrown if you attempt to use a tablespace set for a table family, but that tablespace set is already in use by an existing table family.
Joins across table families may not be efficient, and if you have many such joins, or if they are performance-critical, you should use duplicated tables instead of multiple table families.
Associating Global Services With Multiple Table Families
Each table family should be associated with a different global service. Applications from different table families each have their own connection pool and service, and use their own sharding key for routing to the correct shard.
When you create the first root table (that is, the first table family) all
of the existing global services are automatically associated with it. You can use the
MODIFY SERVICE command to change the services associated with a table
family after more table families are created, as shown in this example.
GDSCTL> MODIFY SERVICE –GDSPOOL shdpool –TABLE_FAMILY sales.customer -SERVICE sales
In Oracle Sharding a table with the same contents in each shard is called a duplicated table.
For many applications, the number of database requests handled by a single shard can be maximized by duplicating read-only or read-mostly tables across all shards. This strategy is a good choice for relatively small tables that are not updated frequently, and that are often accessed together with sharded tables. Duplicated tables tend to be updated less frequently than sharded tables and are not expected to be very large.
A sharded database includes both sharded tables that are horizontally partitioned across shards, and duplicated tables that are replicated to all shards. Duplicated tables contain reference information, for example, a Stock Items table that is common to each shard. The combination of sharded and duplicated tables enables all transactions associated with a sharding key to be processed by a single shard. This technique enables linear scalability and fault isolation.
As an example of the need for a duplicated table, consider the table family that is described in Sharded Table Family. The database schema might also include a
Products table which contains data that is shared by all the customers in the shards that were created for this table family, and it cannot be sharded by the customer number. To prevent multi-shard queries during order processing, the entire table must be duplicated on all shards.
The difference between sharded tables (Customers, Orders, and LineItems) and a duplicated table (Products) is shown in the following figure.
Figure 2-5 Sharded Tables and a Duplicated Table in a Sharded Database
Description of "Figure 2-5 Sharded Tables and a Duplicated Table in a Sharded Database"
Non-Table Objects Created on All Shards
In addition to duplicated tables, other schema objects, such as users, roles, views, indexes, synonyms, functions, procedures, and packages, and non-schema database objects, such as tablespaces, tablespace sets, directories, and contexts, can be created on all shards.
Unlike tables, which require an extra keyword in the
DUPLICATED—other objects are created on all shards using existing syntax. The only requirement is that the
SHARD DDL session property must be enabled.
Note that automatic creation on all shards of the following objects is not supported in this release. These objects can be created by connecting to individual shards.
Materialized zone map
Materialized views and view logs are supported starting in Oracle Database 18c, with the following restrictions:
Materialized views created on sharded tables remain empty on the catalog database, while the corresponding materialized views on shards contain data from each of the individual shards.
REFRESH COMPLETE ON DEMAND USING TRUSTED CONSTRAINTSoption is supported for materialized views on sharded tables.