Chunk Management

You can manage chunks in your deployment with Oracle Enterprise Manager Cloud Control and GDSCTL.

Resharding and Hot Spot Elimination

The process of redistributing data between shards, triggered by a change in the number of shards, is called resharding. Automatic resharding is a feature of the system-managed sharding method that provides elastic scalability of a sharded database.

Sometimes data in a sharded database needs to be migrated from one shard to another. Data migration across shards is required in the following cases:

  • When one or multiple shards are added to or removed from a sharded database

  • When there is skew in the data or workload distribution across shards

The unit of data migration between shards is the chunk. Migrating data in chunks guaranties that related data from different sharded tables are moved together.

When a shard is added to or removed from a sharded database, multiple chunks are migrated to maintain a balanced distribution of chunks and workload across shards.

Depending on the sharding method, resharding happens automatically (system-managed) or is directed by the user (composite). The following figure shows the stages of automatic resharding when a shard is added to a sharded database with three shards.

Figure 10-7 Resharding a Sharded Database

Description of Figure 10-7 follows
Description of "Figure 10-7 Resharding a Sharded Database"

A particular chunk can also be moved from one shard to another, when data or workload skew occurs, without any change in the number of shards. In this case, chunk migration can be initiated by the database administrator to eliminate the hot spot.

RMAN Incremental Backup, Transportable Tablespace, and Oracle Notification Service technologies are used to minimize impact of chunk migration on application availability. A chunk is kept online during chunk migration. There is a short period of time (a few seconds) when data stored in the chunk is available for read-only access only.

FAN-enabled clients receive a notification when a chunk is about to become read-only in the source shard, and again when the chunk is fully available in the destination shard on completion of chunk migration. When clients receive the chunk read-only event, they can either repeat connection attempts until the chunk migration is completed, or access the read-only chunk in the source chunk. In the latter case, an attempt to write to the chunk will result in a run-time error.

Note:

Running multi-shard queries while a sharded database is resharding can result in errors, so it is recommended that you do not deploy new shards during multi-shard workloads.

Moving Chunks

Sometimes it becomes necessary to move a chunk from one shard to another. To maintain scalability of the sharded environment, it is important to attempt to maintain an equal distribution of the load and activity across all shards.

As the environment matures in a composite sharded database, some shards may become more active and have more data than other shards. In order to keep a balance within the environment you must move chunks from more active servers to less active servers. There are other reasons for moving chunks:

  • When a shard becomes more active than other shards, you can move a chunk to a less active shard to help redistribute the load evenly across the environment.

  • When using range, list, or composite sharding, and you are adding a shard to a shardgroup.

  • When using range, list, or composite sharding, and you a removing a shard from a shardgroup.

  • After splitting a chunk it is often advisable to move one of the resulting chunks to a new shard.

When moving shards to maintain scalability, the ideal targets of the chunks are shards that are less active, or have a smaller portion of data. Oracle Enterprise Manager and AWR reports can help you identify the distribution of activity across the shards, and help identify shards that are good candidates for chunk movement.

Note:

Any time a chunk is moved from one shard to another, you should make a full backup of the databases involved in the operation (both the source of the chunk move, and the target of the chunk move.)

You can manage chunks using GDSCTL or Oracle Enterprise Manager Cloud Control:

Updating an In-Process Chunk Move Operation

While a MOVE CHUNK operation is in process, you can use the GDSCTL ALTER MOVE command to suspend, resume, or cancel any or all chunks scheduled to be moved (where the move is not yet started) in the operation.

There are three variations on this command: -SUSPEND is used to postpone chunk migration operation, -RESUME is used to restart the move process, and -CANCEL cancels chunk migration.

In addition, the -CHUNK and -SHARD options are used to filter the list of scheduled chunk moves. You can use the CONFIG CHUNKS -SHOW_RESHARD command to get a list of scheduled chunk moves.

Suspending Chunk Moves

ALTER MOVE -SUSPEND postpones chunk migration for a specified scope until you wish resume or cancel the operation. The shards on which to suspend operation must be specified, and you can list source and target shards. You can also specify a list of specific chunks to suspend.

If any chunk in the defined scope is already being moved (any state other than "scheduled"), that chunk will not be suspended.

For example, the following command suspends all scheduled chunk moves to or from shard1.

GDSCTL> alter move -suspend -shard shard1

Restarting Chunk Moves

ALTER MOVE -RESUME resets any "move failed" flags on specified shards, and restarts any stalled or suspended chunk moves.

You can optionally provide a list of source and target shards that will have their "move failed" flags reset before the moves restart. If no shards are specified, the suspended moves are restarted once any moves in process are complete.

For example, the following command restarts chunk moves on any suspended or "failed" chunk moves scheduled to or from shard1.

GDSCTL> alter move -resume -shard shard1

Canceling Chunk Moves

ALTER MOVE -CANCEL removes specified chunks from the move chunk schedule.

The -CHUNK option specifies that all listed chunks will be removed from the schedule, and -SHARD specifies that all chunk moves to/from this database will be removed from the schedule. If no chunks or shards are specified, then all chunk moves not already in process are canceled.

If any chunk in the defined scope is currently being moved (any state other than "scheduled"), that chunk move will not be canceled.

Chunks that are canceled cannot be resumed/restarted. You must issue a new MOVE CHUNK command to move these chunks.

For example, the following command removes chunks 1, 2, and 3 from the chunk move schedule, if they are not already being moved.

GDSCTL> alter move -cancel -chunk 1,2,3

Splitting Chunks

Splitting a chunk in a sharded database is required when chunks become too big, or only part of a chunk must be migrated to another shard.

Oracle Globally Distributed Database supports the online split of a chunk. Theoretically it is possible to have a single chunk for each shard and split it every time data migration is required. However, even though a chunk split does not affect data availability, the split is a time-consuming and CPU-intensive operation because it scans all of the rows of the partition being split, and then inserts them one by one into the new partitions. For composite sharding, it is time consuming and may require downtime to redefine new values for the shard key or super shard key.

Therefore, it is recommended that you pre-create multiple chunks on each shard and split them either when the number of chunks is not big enough for balanced redistribution of data during re-sharding, or a particular chunk has become a hot spot.

Even with system-managed sharding, a single chunk may grow larger than other chunks or may become more active. In this case, splitting that chunk and allowing automatic re-sharding to move one of the resulting chunks to another shard maintains a more equal balanced distribution of data and activity across the environment.

Oracle Enterprise Manager heat maps show which chunks are more active than other chunks. Using this feature will help identify which chunks could be split, and one of the resulting chunks could then be moved to another shard to help rebalance the environment.

You can manage chunks using GDSCTL or Oracle Enterprise Manager Cloud Control:

Managing Chunks with Oracle Enterprise Manager Cloud Control

You can manage sharded database chunks using Oracle Enterprise Manager Cloud Control.

The following topics describe chunk management using Oracle Enterprise Manager Cloud Control:

Moving Chunks with Oracle Enterprise Manager Cloud Control

You can move chunks from one shard to another in your deployment using Oracle Enterprise Manager Cloud Control.

  1. From a shardspace management page, open the Shardspace menu, located in the top left corner of the Sharded Database target page, and choose Manage Shardgroups.
  2. Select a shardgroup in the list and click Move Chunks.
  3. In the Move Chunks dialog, select the source and destination shards between which to move the chunks.
  4. Select the chunks that you want to move by choosing one of the options.
    • Enter ID List: enter a comma separates list of chunk ID numbers

    • Select IDs From Table: click the chunk IDs in the table

  5. Indicate when the chunk move should occur.
    • Immediately: the chunk move is provisioned upon confirmation

    • Later: schedule the timing of the chunk move using the calendar tool in the adjacent field

  6. Click OK.
  7. Click the link in the Information box at the top of the page to view the provisioning status of the chunk move.

Splitting Chunks with Oracle Enterprise Manager Cloud Control

You can split chunks in your deployment using Oracle Enterprise Manager Cloud Control.

  1. Open the Sharded Database menu, located in the top left corner of the Sharded Database target page, and choose Shardspaces.
  2. If prompted, enter the shard catalog credentials, select the shard director to manage under Shard Director Credentials, select the shard director host credentials, and log in.
  3. Select a shardspace in the list and click Split Chunks.
  4. Select the chunks that you want to split by choosing one of the options.
    • Enter ID List: enter a comma separate list of chunk ID numbers

    • Select IDs From Table: click the chunk IDs in the table

  5. Indicate when the chunk split should occur.
    • Immediately: the chunk split is provisioned upon confirmation

    • Later: schedule the timing of the chunk split using the calendar tool in the adjacent field

  6. Click OK.
  7. Click the link in the Information box at the top of the page to view the provisioning status of the chunk split.
When the chunk is split successfully the number of chunks is updated in the Shardspaces list. You might need to refresh the page to see the updates.