Deploy a Highly Available CockroachDB Cluster

CockroachDB is a distributed SQL database built on a transactional and consistent key-value store.

Architecture

This reference architecture shows a typical three-node deployment of CockroachDB on Oracle Cloud Infrastructure Compute instances. A public load balancer is used to distribute the workloads across these three nodes.

The following diagram illustrates this reference architecture.

Description of cockroachdb-oci.png follows
Description of the illustration cockroachdb-oci.png
  • Regions

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

    This architecture uses one public subnet to host the public load balancer and the three Compute instances running CockroachDB.

  • Load balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from one entry point to multiple servers reachable from your VCN. This architecture has two listeners, one for TCP port 8080 and one for TCP port 26257. The load balancer also has two backend sets that correspond to these two listeners.

  • Security lists

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

    This architecture uses ingress rules for TCP ports 8080 and 26257 so that clients can access the HTTP and CockroachDB listener traffic.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

  • Compute shapes

    This architecture uses the Oracle Linux 7.9 OS image with the VM.Standard.E3.Flex Compute shape. For your application, you can choose a different shape if you need more memory, cores, or network bandwidth.

  • VCN
    • When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    • Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    • After you create a VCN, you can change, add, and remove its CIDR blocks.

    • When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

    • Use a regional subnet.
  • Load balancer bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

Considerations

  • Scalability

    The smallest number of nodes needed to achieve quorum is three, so this architecture deploys three nodes. You might need more nodes to meet your application’s performance or high availability requirements.

    You can horizontally scale your database cluster by adding more Compute nodes and including them in the load balancer backend sets.

    You can vertically scale your database cluster by changing the VM shape of each Compute node. Using a higher core count shape also increases the memory allocated to the Compute instance and network bandwidth.

  • Availability

    Fault domains provide the best resilience within a single availability domain. In this architecture, instead of deploying the application in one availability domain, you can deploy Compute instances that perform the same tasks in regions that have multiple availability domains. This design removes a single point of failure by introducing redundancy. After the architecture is deployed, use the public IP address of the load balancer to connect to the CockroachDB using the built-in SQL client.

  • Cost

    Select the virtual machine (VM) shape based on the cores, memory, and network bandwidth that you need for your database. You can start with a one-core shape, and if you need more performance, memory, or network bandwidth for the database node, you can change the VM shape later.

Deploy

The Terraform code for this reference architecture is available as a sample stack in Oracle Cloud Infrastructure Resource Manager. You can also download the code from GitHub, and customize it to suit your specific requirements.

  • Deploy using the sample stack in Oracle Cloud Infrastructure Resource Manager:
    1. Click Deploy to Oracle Cloud

      If you aren't already signed in, enter the tenancy and user credentials.

    2. Select the region where you want to deploy the stack.
    3. Follow the on-screen prompts and instructions to create the stack.
    4. After creating the stack, click Terraform Actions, and select Plan.
    5. Wait for the job to be completed, and review the plan.

      To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

    6. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
  • Deploy using the Terraform code in GitHub:
    1. Go to GitHub.
    2. Clone or download the repository to your local computer.
    3. Follow the instructions in the README document.

Change Log

This log lists significant changes: