Deploy Linearly Scalable, Fault-Tolerant, Sharded Oracle Databases on Oracle Cloud Infrastructure

When you need a database that supports extreme scale-out with complete data isolation for your enterprise-scale OLTP applications, deploy sharded instances of Oracle Database in a fault-tolerant topology on Oracle Cloud Infrastructure.

Sharding is a data-tier architecture in which data is partitioned horizontally in independent databases, called shards, each running on a separate server with its own CPU, memory, and disk resources. This shared-nothing approach eliminates single points of failure at the infrastructure layer. A pool of shards is presented to the application tier as a single logical database, called a sharded database. Applications that have a well-defined data distribution strategy and access data primarily by using a sharding key can take advantage of a sharded data tier.

Architecture

The following architectures show fault-tolerant, sharded Oracle Database 19c topologies in Oracle Cloud Infrastructure, deployed by using the Oracle Database Sharding stack provided in Oracle Cloud Marketplace.

The architectures have redundant resources in every layer (shard director, catalog, and shards), to ensure maximum availability of the sharded database.

Deployment across multiple availability domains

This architecture shows a sharded database distributed across two availability domains in an Oracle Cloud Infrastructure region.Description of shards-single-region.png follows
Description of the illustration shards-single-region.png

Deployment in a single availability domain

This architecture shows a sharded database distributed across the fault domains within a single availability domain in an Oracle Cloud Infrastructure region.Description of shards-single-ad.png follows
Description of the illustration shards-single-ad.png

The architectures include the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

    All the resources in this architecture are deployed in a compartment that you specify and within a single Oracle Cloud Infrastructure region.

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

    The resources in this architecture are distributed evenly across the fault domains in each availability domain.

  • Virtual cloud network (VCN) and subnet

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

    The compute and database resources in this architecture are attached to a single regional public subnet. You can specify the VCN and subnet to be used. If you'd like to create a new VCN and subnet, then you can specify the CIDR address block for the VCN and subnet.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

    This architecture includes a security list to control traffic to and from the shard directors.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

    This architecture includes a route table to direct traffic from the subnet to the internet gateway.

  • Shard directors

    A shard director (also called global service manager) is a network listener that enables high performance connection routing to the appropriate database shards based on sharding keys.

    You can specify the number of shard directors required. The shard directors are deployed on individual compute instances, which are distributed across all the availability domains in the region and across the fault domains within each availability domain. You can choose the shape of the compute instances to be used for the shard directors.

  • Primary and standby shard catalogs

    A shard catalog is a special-purpose Oracle Database instance that supports automated shard deployment, centralized management of a sharded database, and multishard queries.

    This architecture contains a primary-standby pair of catalog databases, each a single-node Oracle Cloud Infrastructure VM DB system. You can choose the database shape and available storage capacity for the catalog databases.

  • Database shards

    Each database shard is a single-node Oracle Cloud Infrastructure VM DB system. You can specify the number of primary shards, the database shape to be used, and the available storage capacity. The shards are distributed evenly across all the availability domains in the region and across the fault domains within each availability domain.

    While deploying the architecture, you can choose to provision a standby shard for each primary shard.
    • If your region contains multiple availability domains, then each shard of a primary-standby pair is placed in a separate availability domain.
    • If the region contains only one availability domain, then the primary and standby shards are isolated in separate fault domains.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

  • Size and number of shards
    While deploying the stack, you can specify the database shape to be used and the number of shards.
    • Choose an appropriate shape based on your workload requirements. The shape you specify is used for all the shards.

      After deployment, you can change the shape of the individual shards to adapt to changes in the workload. The shard that you change the shape for is stopped and then restarted using the new shape.

    • In general, a large number of small shards provides better overall fault tolerance than a small number of large shards. To scale out the topology or to improve fault tolerance, you can add shards at any time without affecting the availability of the existing shards. When required, you can scale in the sharded database; the last created shard is removed first after moving the data to the other shards.
  • Shards availability

    To ensure high availability of the shards, provision standby shards and use Oracle Data Guard for primary-to-standby synchronization and for failover.

  • Shard catalog availability

    For high availability of the shard catalog, provision a standby catalog and use Oracle Data Guard for synchronization and failover. Note that the availability of the shard catalog has no impact on the availability of the sharded database. An outage of the shard catalog affects only the ability to perform maintenance operations or multishard queries while failing over to the standby catalog. OLTP transactions continue to be routed to the shards.

  • Shard director availability

    For high availability of the shard director layer, deploy multiple shard directors. You can deploy up to five shard directors in a region. Oracle recommends that you deploy at least two shard directors, isolated in separate availability domains or fault domains.

  • Storage

    The current version of the Oracle Cloud Marketplace stack provisions the database shards and the catalog database with Logical Volume Manager (LVM)-based storage. You can specify the available storage capacity, starting from 256 GB up to 40 TB. Choose a storage capacity that's appropriate to your workload. A block volume of the specified size is attached to each database shard.

    You can scale up the storage at any time without affecting the availability of the database. Note that the total storage used is higher than available storage. See the Oracle Cloud Infrastructure Database documentation for more information.

  • Network design

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

Considerations

  • Application design

    Any application that has a well-defined data distribution strategy and accesses data primarily by using a sharding key (such as customer ID, account number, and so on) is suited for sharded databases.

  • Scalability

    After deploying the sharded database, you can increase or decrease the number of shards by editing the stack and applying the changes. Depending on the replication factor that you specify while deploying the stack, the standby shards are also scaled.

    You can scale the number of shard directors as well.

  • Security

    While deploying the sharded database, specify an SSH public key to enable secure SSH connections to the database servers.

  • Network isolation

    To ensure network isolation, deploy the sharded database in a private subnet. For administrative access to the database servers, you can provision a bastion host in a public subnet.

Deploy

A Terraform-based stack to deploy this reference architecture is available in Oracle Cloud Marketplace.

  1. Go to Oracle Cloud Marketplace.
  2. Click Get App.
  3. Follow the on-screen prompts.

Explore More

Using Oracle Sharding (Oracle Database 19c documentation)