Example Workflow

Set up the Infrastructure

Engineers have encountered high queuing times on their on-premises environment and must run several variations of a simulation that requires 72 cores. The Design Engineer must run the simulations and report the results to management within a couple of hours, however the queue wait time for an on-premises HPC environment is 5 days. The Design Engineer reaches out to the Infrastructure Engineer for support to rapidly launch the infrastructure to run the simulations.

The Infrastructure Engineer quickly launches a 2-node HPC cluster on an Oracle Cloud Infrastructure (OCI) bare metal system. The Infrastructure Engineer chooses a BM.Optimized3.36 shape, which is designed for high-performance computing workloads that require high frequency processor cores with RDMA. With this, the Infrastructure Engineer can rapidly provision the cluster through resource manager using Oracle’s cluster networking prebuilt solution, and can automate this step with tools such as open source Slurm, Altair PBS Professional, or Oracle Cloud SDK/CLI.

The Infrastructure Engineer connects to the newly provisioned cluster and ensures that all the required simulation software, visualization nodes, hostfiles, MPI libraries, file systems (such as NFS), batch scheduler (such as Slurm Workload Manager), and Ansible tools are setup on the cluster. Additionally, the Infrastructure Engineer runs a quick latency test, ensuring that RDMA is setup properly (latency must be between 1 – 3 microseconds) before passing it off to the Design Engineer.

Run the Models

The Design Engineer accesses the cluster and uses an Ansible script to rapidly install the motorbike standard model across the 2-node cluster. This example uses OpenFOAM compiled with Intel MPIs.

To run the simulation, the Design Engineer moves to the bastion node, kicks off jobs using Slurm Workload Manager scheduling. The engineer can schedule the first job and run it across the 72 core cluster while the others are still in the queue. Because it is only a 2 node cluster, additional nodes are provisioned to 8 nodes to run all 4 jobs. As each job completes, the corresponding nodes terminate automatically to save on costs. The engineer can retrieve results for each job ID and receive notifications following the completion of each job.

The Design Engineer can take one of the simulation outputs and model it in ParaView on a graphics processing unit virtual machine (GPU VM). For example, the model might show airflow, pressure, turbulence or another parameter.

Description of the illustration run-summary.png

The Design Engineer can run a quick script to save the model outputs into Oracle Cloud Infrastructure Object Storage for later use. The engineer can automate the entire simulation process and upload to object storage.

If needed, they can use Oracle Cloud Infrastructure FastConnect to pull the data back locally without incurring any egress fees.

Display the Data

In this example, the Technical Operations Manager is interested in how long the simulations took in the cloud and how much it costs. The HPC usage data is captured in a database, which is used for cost analysis.

The following example is the simulation time on Oracle Cloud Infrastructure (OCI) HPC versus the simulation times from an on-premises system and shows the overall time saved from running HPC on OCI versus on-premises, and associated costs. In the example, an 8-node OCI HPC cluster was used for a total of 2 hours, and at $0.075 per core, this amounts to $2.70 dollars per instance per hour or $21.60 dollars total for the 2 hours.

In a real-world application, the cost and time savings from running in the cloud are usually more significant than in this particular example. By bursting in the cloud or moving fully to the cloud, on-demand capacity allows for more rapid iterations and improvements to the existing model are possible, paving the way for faster product design, performance, and time-to-release.

Description of manager-dashboard.png follows

Description of the illustration manager-dashboard.png