Accelerate and Scale the Storage of Virtual Machine Images in a KVM Environment

This reference architecture shows a scalable, high-performance data storage and data movement platform designed for optimum data access latency and bandwidth, especially during workload spikes.

Architecture

The architecture uses a single availability domain with 3 fault domains. All components of the solution are highly available with no single point of failure.

On each node, data is stored in two locations: RAM and block volume. RAM is mounted as a local file system using Linux kernel drive block RAM disk (brd). Both RAM-based and block volume-based locations are unified in two file systems. Each file system is shared across the nodes by using GlusterFS, a parallel, scalable file system.

GlusterFS is applied to RAM and attached block volumes to create two distributed volumes. This two-tier file storage allows for high-performance data movement in two directions:
  • Within a node: Any number of virtual machines (VMs) boot up with zero latency, regardless of the actual number of VMs. As a result, the architecture removes the bottleneck of reading VM images from shared storage and prevents service delays during workload spikes.
  • Across nodes: VMs move seamlessly across nodes, thanks to the distributed volumes. As a result, any VM image can be accessed by any node at any time, which allows administrator to quickly re-balance VM images across the bare metal (BM) nodes as the pool of nodes grows (or shrinks in case of hardware failure on a node). You can optionally tier the two shared storage types, RAM-based and block-volume-based storage, into a single shared storage instance, where the RAM tier is used for frequently accessed (hot) storage, and block volume is used for longer-term (cold) storage.

The following diagram illustrates this reference architecture.



vm-kvm-oci-oracle.zip

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domain

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domain

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnet

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Site-to-Site VPN

    Site-to-Site VPN provides IPSec VPN connectivity between your on-premises network and VCNs in Oracle Cloud Infrastructure. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.

  • FastConnect

    Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Network address translation (NAT) gateway

    A NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

  • Bastion host

    The bastion host is a compute instance that serves as a secure, controlled entry point to the topology from outside the cloud. The bastion host is provisioned typically in a demilitarized zone (DMZ). It enables you to protect sensitive resources by placing them in private networks that can't be accessed directly from outside the cloud. The topology has a single, known entry point that you can monitor and audit regularly. So, you can avoid exposing the more sensitive components of the topology without compromising access to them.

  • Bare metal

    Oracle’s bare metal servers provide isolation, visibility, and control by using dedicated compute instances. The servers support applications that require high core counts, large amounts of memory, and high bandwidth. They can scale up to 160 cores (the largest in the industry), 2 TB of RAM, and up to 1 PB of block storage. Customers can build cloud environments on Oracle’s bare metal servers with significant performance improvements over other public clouds and on-premises data centers.

  • Block volume

    With block storage volumes, you can create, attach, connect, and move storage volumes, and change volume performance to meet your storage, performance, and application requirements. After you attach and connect a volume to an instance, you can use the volume like a regular hard drive. You can also disconnect a volume and attach it to another instance without losing data.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.

  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

    Use regional subnets.

  • Cloud Guard

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Security Zones

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Network security groups (NSGs)

    You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists, because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.

  • Hypervisor Nodes

    Deploy HPC bare metal shapes to get full performance. This architecture uses the BM.Standard.E4 shape.

Considerations

Consider the following points when deploying this reference architecture.

  • Performance

    To get the best performance, choose the correct Compute shape with appropriate bandwidth.

  • Availability

    Consider using a high-availability option, based on your deployment requirements and region. Options include using multiple availability domains in a region and fault domains.

    • Monitoring and Alerts

      Set up monitoring and alerts on CPU and memory usage for your nodes, so that you can scale the shape up or down as needed.

    • Cost

      A bare metal GPU instance provides the necessary CPU power for a higher cost. Evaluate your requirements to choose the appropriate Compute shape.

      You can delete the cluster when there are no jobs running.

  • Cluster File Systems

    There are multiple scenarios:

    • Optimized, DenseIO shapes that come with the HPC shape.
    • Multi-attach block volumes deliver up to 2,680 MB/s IO throughput or 700k IOPS.
    • You can also install your own parallel file system on top of either NVMe SSD storage or block storage, depending on your performance requirements. OCI provides scratch and permanent NFS based (NFS-HA, FSS ) or parallel file system (weka.io, Spectrum Scale, BeeGFS, BeeOND, Lustre, GlusterFS, Quobyte) solutions.

Deploy

The GlusterFS app and Terraform code for Linux KVM are available in Oracle Cloud Marketplace.

A Terraform stack to create Oracle Linux KVM instances is available in Oracle Cloud Marketplace.

  1. Go to Oracle Cloud Marketplace.
  2. Click Get App.
  3. Follow the on-screen prompts.

The GlusterFS app is available in Oracle Cloud Marketplace. You must deploy GlusterFS manually on each instance.

  1. Go to Oracle Cloud Marketplace.
  2. Click Get App.
  3. Follow the on-screen prompts.

Acknowledgments

Authors: Yuri Rassokhin, Deepak Soni