Configure the HPC Cluster Stack from Oracle Cloud Marketplace

The HPC Cluster stack uses Terraform to deploy Oracle Cloud Infrastructure resources. The stack will create GPU nodes, storage, standard networking and high performance cluster networking, and a bastion/head node for access to and management of the cluster.

Deploy the GPU Cluster

Your Oracle Cloud account must be in a group with permission to deploy and manage these resources. See HPC Cluster Usage Instructions for more details on policy requirements.

You can deploy the stack to an existing compartment, but it may be cleaner if you create a compartment specifically for the cluster.

Note:

While there is no cost to use the Marketplace stack to provision an environment, you will be charged for the resources provisioned when the stack is launched.

Create a compartment for your tenancy and region and verify policies are available.
1. Log into the OCI Console as the Oracle Cloud user for the tenancy and region in which you want to work.
2. Create a compartment for the cluster resources.
3. Ensure that OCI policies are in place to allow you to build the cluster.
  This may require assistance from your security or network administrator. The following is an example Policy: "allow group myGroup to manage all-resources in compartment compartment".
Use the HPC Cluster stack to deploy the GPU cluster.
1. Navigate to Marketplace, then click All Applications.
2. In the search for listings box enter HPC Cluster.
  
  Note:
  If the HPC Cluster stack is not available in OCI Marketplace in your tenancy, then you can clone the stack from GitHub (git clone https://github.com/oracle-quickstart/oci-hpc.git) and import it into OCI Resource Manager. This provides the same functionality but requires that you have access to a suitable “custom OS image” for the GPU nodes.
3. Click HPC Cluster.
4. Select a version.
  We used the default v2.10.4.1.
5. Select a compartment in which to build the cluster
6. Click Launch Stack.

Configure the Cluster

When the stack is launched, complete the Stack Information and Configuration pages to begin configuring the cluster.

Complete the Stack Information page:
1. Enter a name for your cluster.
2. (Optional) Add a brief description.
3. Click Next.
The Configuration Variables page is displayed.
Configure the cluster.
The Configuration Variables page provides many opportunities to customize the cluster to your needs. We don’t cover each option in detail. Rather we provide guidance where any non-default settings are needed to build a GPU cluster that supports NVIDIA cluster software.
1. In Public SSH key, add a public key that will allow you to log into the cluster.
2. Select the check box use custom cluster name, then enter a base host name.
  This is used as the prefix for bastion and login node host names.
Use Headnode options to customize the bastion.
This instance serves as the main login node, the Slurm controller, and may also be used for some builds and other activities in support of the cluster. Here you can adjust the CPU, memory, and boot volume capacity to suit your requirements
Use Compute node options to select the type and quantity of worker nodes in the cluster. You can deploy these nodes with an OS image from the Marketplace, or provide a custom image with your preferred OS build.

Description of the illustration config-compute.png
- Availability Domain: If you're working in a region with multiple availability domains (ADs), then select the AD with the best availability of GPU resources.
- Select Use cluster network.
- Shape of the Compute Nodes: Select the bare metal GPU shape you are using in this cluster. For example, BM.GPU4.8.
- Initial cluster size: Enter the number of bare metal nodes to be provisioned.
- To build with a preconfigured OS image from the Marketplace, select use marketplace image. Then, under Image version, select one of the GPU images for an OS preconfigured with drivers for GPUs and RDMA networking.
- Use marketplace image: If you are building the cluster with a custom image, then deselect this check box and select Use unsupported image, and then under Image OCID provide the OCID of the custom image that you have already uploaded to the tenancy.
- Use compute agent: This option might be required for Marketplace images.
(Optional) Additional Login Node provides a second host for cluster users to interact with the cluster. Shape, ocpus, and memory may be customized to meet your requirements.
Advanced storage options offers several ways to preconfigure shared storage that will be available across the cluster. Click Show advanced storage options to expand the selections.

Description of the illustration config-storage.png
- The bastion home directory is NFS shared across all cluster nodes. This is part of the Headnode’s boot volume, which you can customize in the Headnode options.
- For more shared storage select Additional block volume for shared space and enter the capacity. This volume is attached to the bastion and shared across the cluster as /nfs/cluster.
- Shared NFS scratch space from NVME or Block volume shares the NVMe capacity from the first compute node across the cluster as /nfs/scratch. This provides a higher performance storage than the headnode volumes, but may provide less capacity and availability.
- Mount Localdisk will create a filesystem from NVMe on each compute node and mount it locally on that node.
- One Logical Volume uses LVM to create one larger volume from multiple NVMe devices.
- Redundancy increases reliability (but halves the usable capacity) of NVMe storage by mirroring devices.
Use Network options to select the VCN.
- Use Existing VCN: Deselected by default. If deselected, then a new VCN is provisioned. Select this check box to provision the cluster within an existing VCN and subnets. This may make it easier to integrate this cluster and other tenancy resources.
Use Software to select the software to install.

Description of the illustration config-software.png
- Select Install SLURM to provide SLURM job scheduling and management.
- (Optional) Select Install HPC Cluster Monitoring Tools to provide more insight into cluster activity and utilization.
- Select Install NVIDIA Enroot for containerized GPU workloads. This prepares the environment to run workloads in NVIDIA PyTorch, NVIDIA NeMo Platform, and other containers.
  
  Note:
  It is important that you select this option.
- Select Install NVIDIA Pyxis plugin for Slurm.
- Select Enable PAM to limit login access to compute nodes.
Debug: Ensure that Configure system is selected (this is the default).
Click Next to review the configuration settings.

In review, Cluster Configuration shows all of your non-default selections. You can return the Previous page to make changes and revisit all settings.
Select Run apply and then click Create to launch the stack to your tenancy and build the cluster.
The stack is created in OCI Resource Manager, and launched with your specifications to begin provisioning immediately.

This process will take several minutes. Provisioning the nodes takes only a few minutes per compute node, but installing additional software to the nodes adds to the build time. You can monitor build progress in the OCI Console. Go to Resource Manager and then Jobs to review the most recent job log for activity and possible errors. When the Resource Manager Job status reports Succeeded, the deployment is complete.

If the final status does not succeed, then review the Job log for details. For issues related to compute instances and cluster networks, more information may be available under Cluster network work requests. To navigate to the page, go to Compute, then Cluster networks, then Cluster network details, then Cluster network work requests. Select the most recent Work request to view details.

For successful builds, the IP address of the Bastion is reported at the end of the Job log. If you configured an Additional login node, then the login node IP address is also shown. For example,

Apply complete! Resources: 23 added, 0 changed, 0 destroyed.

Outputs:
RDMA_NIC_metrics_url = "https://objectstorage.us-gov-phoenix-1.oraclecloud.com
backup = "No Slurm Backup Defined"
bastion = 139.87.214.247
login = "No Login Node Defined"
private_ips = 172.16.6.4 172.16.7.109"

Make a note of the Bastion public IP address, as it is needed to log into the cluster.
You can also locate the IP addresses in the OCI Console under Compute, then Instances.