Important Considerations for setting up an Oracle Cloud Instance

By Rajib Ghosh - Senior Solutions Architect, Oracle for Research

Understanding the basics of setting up an Oracle Cloud tenancy and instance can help researchers and the technology specialists get running quickly. However, taking a moment to understand some additional details can help optimize your use of Oracle Cloud. We encourage you to consider the following points when setting up an Oracle cloud instance.

A compute instance can be virtual machine (VM) or a physical bare metal machine (BM). Here, we explain the key considerations for setting up a compute instance.

Compute images

A compute image is a template of a virtual hard drive that determines the operating system and other software for an instance. Oracle cloud provides the following types of images:

  1. Platform images: These are pre-built Linux or Windows operating system images ready to be deployed in the Oracle cloud. Platform images are tested by Oracle against various hardware shapes and are optimized to perform on Oracle cloud. Each image comes with multiple release builds to choose from. This is available as advanced option for compute instance creation. Platform image for ARM based devices is available through Oracle Linux 8 image.

    • TIP: Choose pre-built platform images while starting your project. If your software configurations are not closely tied to a Linux distribution like Centos or Ubuntu, we recommend using Oracle Linux. This has certain maintenance, security and compatibility advantages such as automated patching over other releases on Oracle cloud. You can convert your current Linux distributions to Oracle Linux quickly with this link. Also, note that windows images cannot be exported out of the Oracle cloud tenancy because of Microsoft licensing considerations and you may need to take a backup of your installed software and data and export them out.
  2. Oracle images: These are pre-built images created by Oracle with software tools pre-installed in them. They are tested for software version compatibilities against the OS version and are installed with latest patches. They are also tested against relevant sample data. They are designed to jump-start research projects and deliver a common image framework for researchers within and across universities. Some of the popular HPC (High performance computing) and Data science images are listed below:

    1. AI (All-in-One) GPU Image for Data Science
    2. Genome analysis toolkit
    3. Julia AI/HPC GPU Image
    4. NVIDIA images and NVIDIA GPU image
    • TIP: You may use the following steps to determine whether you need to use an Oracle provided image or build your own from scratch.

    a. Compare the toolset and the version provided by Oracle image with the toolsets and version you require. Oracle images usually provides latest compatible working software versions with patches applied. Also, Oracle images are tested and benchmarked against relevant shapes with representative sample data. Check to see if this is advantageous and works for your scenario.

    b. If the tools and versions required are very specific and have a smaller footprint than the Oracle image, it is better to start from the closest operating system image.

    c. Consult with Oracle for Research or Oracle cloud technical team for a solution that works best for your scenario.

  3. Cloud marketplace images: These are Oracle cloud partner images developed by various third-party vendors. These images can be directly provisioned to your Oracle cloud tenancy from the marketplace without any download. Some marketplace images of interest to researchers are included below:

    1. Oracle HPC Cluster and Oracle HPC File system
    2. NVIDIA GPU Cloud machine image
    3. Oracle Linux 7 Cluster Networking Image
    4. Molecular dynamics images ( NAMD runbook and GROMACS runbooks)
    5. Oracle marketplace slurm image (HPC + Slurm combo)
    6. Oracle cloud slurm image
    7. BeeGFS on demand
  4. GitHub Images: Images with associated code and documentation are also provided in OCI-HPC Github. OCI images are also available as containers and can be found in opencontainers Github repository. The GitHub repositories can be cloned or forked by you for additional customization which allows a more collaborative and community development approach.
  5. Custom images: These are images you create from on-campus or from a different cloud. They contain your custom tools and versions, configurations and data. Once uploaded they can be shared within Oracle cloud across tenancies and exported out for external usage as well. You can also build them from a running Oracle cloud instance as well. Custom images provide a point-in-time snapshot of an instance and can be versioned.
    • TIP: Use custom images to build the same instance in another availability domain. Export the image to object store and download it to move it out of Oracle cloud. You can also move attached block volumes as well using between OCI tenancies and regions
  6. Boot volumes: These are a persistent way to keep your software installs and configurations in a volume to use in another instance later. Boot volumes cannot be shared by multiple instances concurrently and must be used within the same availability domain in your tenancy. However, they can be cloned to replicate and build another instance. Boot volumes can be extended as well.
  7. Image OCIDs: are the unique identity tags allocated to an image in Oracle cloud. It is possible to have multiple OCIDs for an image based on various regions i,e Ashburn, Frankfurt. You may also share OCIDs for custom images you built with other researchers as well. For more information, you may refer to Oracle cloud provided images or Oracle custom images

    • TIP: Use Image OCID feature to manage and share resources in an environment with large number of cloud images with many researchers working simultaneously.

Instance shapes

Instance shapes are hardware specifications (i,e CPU, memory or storage) that can be used to spin up a hardware instance of a specific image. Instance shapes are broadly categorized as virtual (VM) or physical bare metal (BM) and are available from multiple vendors. Instance shapes provide you with the flexibility to scale your application from low cost to high performance hardware available in the cloud. Oracle cloud provides both flexible (AMD Rome) and fixed shapes (Intel Skylake).

The following table describes the available shapes, specification and their usage. image image

Network tier and security lists

Network components like subnets and security lists provide the required isolation of your VM/BM instances from direct external access. Oracle cloud network is restrictive by default and only allows SSH (port 22) access to allow external logins to the VMs. This is sufficient for most researcher use-cases if you perform your work logging into the VMs. However, if you wish to stand up an application and have specific port requirements, you need to open them by adding a ingress security list entry for the VM’s subnet.

The diagram below shows addition of port 3389 to enable RDP access for any Windows VMs in the subnet. Note that the source is set to a CIDR value 0.0.0.0/0 meaning that is open to anyone trying to RDP to Oracle cloud on that port. However, you may consult your local network administrator to set this to your local network CIDR values to restrict access to your cloud VM from your local network only.

image

You may also secure your VM in a self-service manner without the help of your network administrator. However, it is a bit involved and a recommended practice is to create all your computational VM/BM in a private subnet and access them from a free tier windows or Linux machine in a public facing subnet in Oracle cloud. This lets you SSH connect to your Always free VM in public subnet and use it as a gateway to connect to your compute instances in the private subnet. You may refer to OCI VCN Introductory page or consult OFR technical team for details. A schematic diagram of the architecture and a screenshot showing the updated security list rule are shown below. This security rule will only allow RDP (remote desktop) access from the Oracle cloud free tier VM as opposed to anywhere in the internet

image image

Managing usage and costs

Usage control, automation and credits

To get more out of Oracle cloud tenancy credits, it is important that compute resources be fully utilized. With most of HPC and AI/ML computations being batch oriented, it is extremely important that instances be used at full or near full capacity and terminated when not in use. Credit control can be effectively performed in two ways:

You may refer to Resource billing for stopped instances for more details as well.

  1. Start with low-cost and scale to high-end shapes:

    Standard and AMD VM shapes provides lower per hour cost and are recommended to use during the software installation, image building and testing phases of your project. Standard shapes can be stopped without billing but high end shapes like DenseIO, GPU or HPC shapes must be terminated to stop billing. It is recommended to get a benchmark of your workload by starting with a VM and slowly moving to expensive BM shapes.

  2. Start with low data volume and scale up:

    Start with a lower data volume to get a sense of CPU and memory utilization and scale the data to find the optimal threshold of CPU and RAM for that shape. You may do the same for IO intensive loads to check out various storage types (local SSD, block volume or a combination) to see how they perform while scaling with data. Testing workloads in this way can give you a sense of performance gains that can be achieved as you scale up your workloads and shape.

  3. Utilizing GPU/HPC shapes:

    GPU and HPC shapes are expensive and should only be used during computational cycles. The BM GPU and HPC shapes are billed by the hour and hence need previous workload cycle estimation for effective usage. Measuring CPU and memory utilization at the operating system level and CPU/GPU level and benchmarking them against your data can help. Several tools like (free, vmstat or iostat), fio, mdtest, IO500 or nvidia-smi can be used for this purpose.

  4. Instance creation and termination automation:

    The rule of thumb is to create an instance just before your computation and terminate them after your run. You can do this manually from OCI console but it can be inefficient and hence automation is often a better choice. Oracle recommends OCI command line interface (CLI) or Terraform scripts in conjunction with Linux shell or Windows powershell for automation and control. Though OCI provides an API, OCI CLI provides a quicker and more efficient way to develop and manage automation.

    The following links can help you on your journey to automation:

    a. OCI CLI documentation

    b. OCI CLI GitHub site

    c. OCI CLI command reference

    d. OCI terraform provider examples

    e. Cluster in the cloud

  5. Estimating cluster size and instance scaling:

    Estimating the size and number of nodes in a cluster is necessary for optimal use of OCI resources. Though the estimation process can vary depending on project needs, a general practice is to estimate the total CPU/GPU hours, IO throughput and network bandwidth for the project, benchmark it against the Oracle cloud shapes to estimate the number of nodes required for your workload. Once done, you may be able to use the OCI Instance pooling feature to auto-scale nodes based on your workload.

  6. Credit control with cost analysis, budgeting and alerts:

    Cost analysis provides you with a summarized and a drill-down view of resource usage and costs for your tenancy with a variety of visualization charts. You can also customize them by adding different filters that may be of importance to you. Oracle cloud automatically generates detail summarized reports and you may be able to integrate them with your on-campus cost reporting system as well.

    Furthermore, it is possible to set soft limits on your tenancy spend (budgets) and set alerts to know when you are exceeding your budgets.