Relation Therapeutics: HPC Biotechnology Analytics Platform on Oracle Cloud

To better understand the causes of diseases and to help discover novel ways to treat those diseases, and to reduce the number of unsuccessful drug development programs, Relation Therapeutics (RelationRx) uses a graph-based recommendation engine to map the relationships between human genetics, single-cell profiles, and functional genomics.

By running its biotechnology analytics platform in a high-performance computing cluster on Oracle Cloud Infrastructure (OCI), RelationRx applies data science and machine learning methods to quickly determine the causal relationships that drive diseases.

Founded in 2019, the London-based startup is currently working with The Bill and Melinda Gates Foundation to identify therapeutic candidates for immune complications arising from COVID-19. The company is also working with the Mila AI Research Institute and G3 Therapeutics, focusing on deep molecular profiling, DNA methylation, RNA sequencing, proteomics, metabolomics, and lipidomics.

Since moving its platform to OCI, RelationRx has built a data mesh architecture, which helps the biotechnology startup make data available to both engineers and data scientists. As a result, RelationRX data scientists have been able to share the compute and infrastructure built by the engineering team while still maintaining ownership over the data, and then controlling access by using Oracle Cloud Infrastructure Identity and Access Management, policies, and groups.

The unique aspects of the Relation Therapeutics architecture are:

  • Application of bare metal and high-performance computing (HPC) resources
  • The use of NVMe-based storage to accommodate up to tens of terabytes of data so that the servers are not slowed by any data access latency
  • The building of environments based upon a blueprint allows the creation of new setups in a consistent manner
  • Data management by using data mesh design principles

RelationRx's adoption of OCI was driven not only by the fact that OCI meets all of their technical requirements, but also the Oracle team's exceptional understanding of startups, their support with the appropriate people and resources, and a level of attention to RelationRx's needs that is not available elsewhere.

Architecture

The core of the architecture is Relation Therapeutics' application of high-performance computing (HPC) and bare metal servers to power their data science and machine learning processes.

To make the most of these capabilities, Relation Therapeutics currently ingests data and manages the computer resources across two regions, London and Frankfurt, with the data ingestion and data science processes in London and the machine learning (ML) in Frankfurt. Data sets are gathered from labs, vendors, and other public sources. Relation Therapeutics runs the incoming data through its extract, transform, and load (ETL) pipeline that cleanses, standardizes, and, when necessary, anonymizes the data. Data science services are used to help identify any data issues that may require some further cleansing. Analytics capabilities are also used to help develop the requirements for the ML processing. The ingested and prepared data is then linked to a knowledge trough and is stored in the company's data lake. From there, the data is run through the company's machine learning pipeline where it is analyzed and used to make inferences or to run additional experiments.

The London region in total comprises four key private subnets:

  1. Data science systems, including one bare metal server
  2. ETL file system, containing three VMs with autoscaling, and an instance pool
  3. Services cluster, which includes containers, Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE), persistent volume, and a domain name server (DNS)
  4. Developing and testing cluster, which provides data scientists with one virtual machine and one bare metal compute server

The prepared data that has been ingested and readied for use is held in a data lake overlayed with a data mesh architecture. The data mesh approach means that the data can be managed by the teams that "own" the data rather than needing there to be a dedicated data engineering team. These design concepts help provide agility and flexibility in the delivery and use of the data while using OCI services.

In the Frankfurt region, Relation Therapeutics machine-learning users access a virtual cloud network (VCN) on OCI using a configured Virtual Private Network (VPN) connecting their offices to OCI. After users have been authenticated through Oracle Cloud Infrastructure Identity and Access Management, they can work with the services provided in OCI. The environments work from a common template (or motif) that provides the core resources needed for the research. This core template is defined so that it can be autoscaled and so that it resides in its own private subnet, thereby providing control and security to the services. The core service cluster(s) contains virtual machines, high-performance storage, domain name system server (DNS), and OKE with containers to perform machine learning and analytical processes. The users can supplement the template with any additional technical and data resources as needed such as databases through the use of a separate services subnet.

The management of these resources is through a bastion server that occupies its own subnet. The bastion is used to access and manage the high-performance computing clusters. The bastion node supports the following:

  1. Compute node scheduling and dynamic bursting control
  2. File transfer into and out of the HPC environments using NFS file servers
  3. Cluster administration management
  4. User access control

To support the development and experimentation with new algorithms and other machine-learning workloads, users have access to test and staging environments containing both virtual machines and bare metal GPUs. These environments are supplemented with continuous integration and continuous development (CI/CD) capabilities. The non-production environments also have their own subnets and are sized to operate with a small subset of a production data set that can run to tens of terabytes of data. These environments include the use of two bare metal servers containing eight Nvidia Tesla A100 GPUs.

To manage the workloads, SLURM, the open source software for managing HPC services, resides on the bastion server and launches the appropriate number of computing instances to run the HPC workload based on user job requirements. When the job execution is done, SLURM automatically terminates the computing instances if there are no other jobs in the queue waiting for identical resources. The dynamic bursting capabilities of this architecture enable the researchers to immediately use the required computing nodes while only paying for the resources that are being used. Depending on user requirements, the bastion node can be selected from a broad range of virtual machines that OCI offers, starting from a low-cost VM.Standard.E3.Flex.

The data processing performed by Relation Therapeutics follows a natural sequence:



The process is shown in the following architecture diagram with the production flow in the subnets across the lower part of the diagram and the supporting processes in subnets in the upper half of the diagram.



relation-therapeutics-oci-oracle.zip

The architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Identity and access management (IAM)

    Oracle Cloud Infrastructure Identity and Access Management (IAM) is the access control plane for Oracle Cloud Infrastructure (OCI) and Oracle Cloud Applications. The IAM API and the user interface enable you to manage identity domains and the resources within the identity domain. Each OCI IAM identity domain represents a standalone identity and access management solution or a different user population.

  • Policy

    An Oracle Cloud Infrastructure Identity and Access Management policy specifies who can access which resources, and how. Access is granted at the group and compartment level, which means you can write a policy that gives a group a specific type of access within a specific compartment, or to the tenancy.

  • Logging
    Logging is a highly scalable and fully managed service that provides access to the following types of logs from your resources in the cloud:
    • Audit logs: Logs related to events emitted by the Audit service.
    • Service logs: Logs emitted by individual services such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN flow logs.
    • Custom logs: Logs that contain diagnostic information from custom applications, other cloud providers, or an on-premises environment.
  • Registry

    Oracle Cloud Infrastructure Registry is an Oracle-managed registry that enables you to simplify your development-to-production workflow. Registry makes it easy for you to store, share, and manage development artifacts, like Docker images. The highly available and scalable architecture of Oracle Cloud Infrastructure ensures that you can deploy and manage your applications reliably.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and never traverses the internet.

  • Network address translation (NAT) gateway

    A NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.

  • Container Engine for Kubernetes

    Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.

  • Compute

    The Oracle Cloud Infrastructure Compute service enables you to provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.

  • Bare metal

    Oracle’s bare metal servers provide isolation, visibility, and control by using dedicated compute instances. The servers support applications that require high core counts, large amounts of memory, and high bandwidth. They can scale up to 160 cores (the largest in the industry), 2 TB of RAM, and up to 1 PB of block storage. Customers can build cloud environments on Oracle’s bare metal servers with significant performance improvements over other public clouds and on-premises data centers.

  • Remote peering

    Remote peering allows the VCNs' resources to communicate using private IP addresses without routing the traffic over the internet or through your on-premises network. Remote peering eliminates the need for an internet gateway and public IP addresses for the instances that need to communicate with another VCN in a different region.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

Get Featured in Built and Deployed

Want to show off what you built on Oracle Cloud Infrastructure? Care to share your lessons learned, best practices, and reference architectures with our global community of cloud architects? Let us help you get started.

  1. Download the template (PPTX)

    Illustrate your own reference architecture by dragging and dropping the icons into the sample wireframe.

  2. Watch the architecture tutorial

    Get step by step instructions on how to create a reference architecture.

  3. Submit your diagram

    Send us an email with your diagram. Our cloud architects will review your diagram and contact you to discuss your architecture.

Acknowledgments

  • Authors: Sasha Banks-Louie
  • Contributor: Robert Lies, Phil Wilkins