About the Architecture

The architecture involves a multicloud solution with Google Cloud and OCI where GKE orchestrates the overall training and inferencing process while it offloads the computationally intensive part to OCI AI Infrastructure on demand. Data is transferred between the two clouds, and results are returned to GKE for further processing.

The following diagram illustrates the reference architecture:

Description of gke-oci.png follows
Description of the illustration gke-oci.png

gke-oci-oracle.zip

Architecture Components

This architecture contains these components:

GKE Cluster (Google Kubernetes Engine)
The GKE Cluster manages containerized model training jobs and submits training jobs to the Kubernetes cluster.
Model Training Job Definition
Model Training Job Definition specifies the training script, data location (Cloud Storage), model parameters, and desired number of worker nodes.
Containerized Training Script
A Containerized Training Script runs on worker nodes, performing the actual model training by using the model running on OCI AI Infrastructure.
Kubernetes Operator (Optional)
The Kubernetes Operator is an optional parameter that automates deployment and management of training jobs on GKE.
Cloud Storage
Cloud Storage stores the training data and model artifacts.
Cloud Monitoring (Optional)
Cloud Monitoring is an optional component that monitors job status, resource utilization, and training metrics.
Model Results
Model results are sent back to GKE for evaluation, storage, or deployment.
Availability domain
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain shouldn't affect the other availability domains in the region.
FastConnect
Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.
Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
Virtual cloud network (VCN) and subnet
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
Compute
The Oracle Cloud Infrastructure Compute service enables you to provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.
Container Engine for Kubernetes
Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.

Oracle Interconnect for Google Cloud
Oracle Interconnect for Google Cloud is a dedicated, private interconnection service combining OCI FastConnect partner connections and Google Cloud Partner Interconnects that helps multicloud customers innovate across two clouds and apply existing and familiar tools to support workloads.

Communication Flow

In this architecture, data traffic flows thusly:

Customers submit a model training job definition through GKE.
The job definition specifies the containerized training script, data location, and desired worker nodes.
Worker nodes pull the training script and data from Cloud Storage. The training script leverages the GPUs running on OCI AI Infrastructure to train the model.
Training results are uploaded to Cloud Storage or sent back to GKE for further processing.
Optional Cloud Monitoring collects metrics from the training job for performance analysis.

Additional Inference Use Cases

In addition to the use case described above, this architecture also supports two inference use cases:

Real-time inference with low latency requirements.
Batch inference for large datasets.

Real-Time Inference with Low Latency Requirements

In this use case, customers require immediate responses from the LLM model for applications like chatbots, virtual assistants,or real-time translation. The solution engineers the following data flow:

User input is sent to the GKE environment running on GCP.
GKE orchestrates the request to the OCI AI Infrastructure.
OCI AI Infrastructure processes the input using the deployed LLM model.
Inference results are returned to GKE.
GKE formats and sends the response to the user.

The benefits of this solution are threefold:

It provides low latency inference due to Oracle Interconnect for Google Cloud, which reduces the proximity of inference infrastructure to the application.
It has sufficient scalability to handle varying inference loads through OCI AI Infrastructure's elastic capabilities.
It offers potential cost savings by optimizing inference hardware and software.

Batch Inference for Large Datasets

In this case, customers need to process large volumes of data through the LLM model in a batch mode, such as sentiment analysis on a massive dataset or generating summaries for a large corpus of documents. You can address this case by implementing this data flow:

Data is prepared and stored in a Google Cloud storage bucket.
A batch job is initiated in GKE, triggered by a Cloud Scheduler or Cloud Functions.
GKE orchestrates the transfer of data to the OCI AI Infrastructure.
OCI AI Infrastructure processes the data in batches using the LLM model.
Inference results are stored in a Google Cloud storage bucket.
Post-processing, if required, is performed in GKE.

The benefits of this solution are threefold:

It provides cost-effective processing of large datasets by leveraging OCI AI Infrastructure's compute power.
It provides improved performance compared to running inference on GKE alone.
It has the ability to handle diverse data formats and sizes.