Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Manage OCI Kubernetes Engine with Different Capacity Types and Resolve Common Issues on Preemptible Nodes
Introduction
As we continue through each cycle of digital transformation, businesses continue to innovate and iterate, pushing the boundaries of what is possible for infrastructure, applications and management at scale. One of the more recent and widely adopted technologies used to support infrastructure and application management is Kubernetes. Before we dive further into Kubernetes, we must first look under the hood at the concept of containers.
Containers are a package of software including a collection of application specific code along with the necessary runtime and program libraries to support reliable execution of an application. Containers are configured to run within a user space inside the Operating System (OS) of the underlying server. This architecture allows for the decoupling of core services and dependencies lightweight deployment of an application. Benefits of application containerization include improvements in workload isolation, resource efficiency, scalability and fault tolerance.
To harness the power and efficiency of containers at scale, we need some sort of tooling to interface or manage our container-based deployments. Kubernetes, also known as K8s, is an open-source container orchestration tool that automates container deployment by creating a cluster of servers for which containers can be run, scaled and delivered to your users. The Kubernetes cluster architecture includes a master node (control plane), and multiple worker nodes. Each worker hosts a pod/s (collection of containers) which delivers your application.
Objectives
- Work with Oracle Cloud Infrastructure Kubernetes Engine (OCI Kubernetes Engine or OKE) for different capacity types and resolve common issues on preemptible.
Prerequisites
- Administrator access to an OCI tenancy and OKE Cluster running.
How is Oracle Positioned with Container Technology?
OKE is a fully managed, scalable, and highly available Kubernetes service that helps customers to deploy containerized applications to the cloud. OKE gives OCI customers the ability to optimize compute resource utilization to meet unique workload requirements and quickly adapt as workload requirements change. OKE delivers a seamless customer experience giving customers unparalleled price-performance, resource efficiency, portability and reliability. OKE provides several key integrations with various container lifecycle management products including container registries, CI/CD frameworks, networking solutions, storage options, and top-notch security features.
In OKE, you can specify the cluster type as either basic or enhanced clusters. Basic clusters support all the core functionality provided by OKE. To enable further capabilities, enhanced clusters support all available features including virtual nodes, self-managed nodes, cluster add-on management, more granular Oracle Cloud Infrastructure Identity and Access Management (OCI IAM) configurations and so on.
OKE on Different Capacity Types
-
On-Demand Capacity: On-demand capacity is standard available capacity of a given shape type. This is the default capacity type and is effectively pay for what you use. While this is the standard choice, depending on the shape type and timing, there can be difficulty in fulfilling these requests for large multi-instance workloads.
-
Reserved Capacity: Capacity reservations can be used to provision OKE managed nodes. Capacity reservations are generally used to ensure enough available capacity for business-critical workloads during production impacting events including planned maintenance, user demand growth, and disaster recovery. Capacity reservations incur a cost of 85% of the SKU list price while the reserved resources are not being actively used. To leverage a capacity reservation, you must first create the capacity reservation object and specify the region/availability domain along with the shape type and size. When your capacity reservation has been created, you can specify the specific reservation as the capacity type when deploying nodes in your node pool.
-
Preemptible Capacity: OKE managed nodes can now be provisioned with preemptible compute shapes. Preemptible shapes can be a source of cost savings (50% discount from SKU list price), however, Oracle maintains the right to reclaim these compute resources if/when needed for higher priority demand. Preemptible compute can be a good option if you have stateless fault tolerant workloads which can withstand interruption. For visibility, you can elect to be notified when an instance is terminated when deploying preemptible compute. Within the OKE cluster, node pool expected state will attempt to launch a new instance(s) to replace the reclaimed instance(s).
OKE Deployment with Preemptible and OnDemand Node Pools
Known Issues with Preemptible
There are some known issues when using preemptible with OKE.
-
Issue: Some customers get error NonRetryable BmcException: Error returned by LaunchInstance operation in Compute service. (400, InvalidParameter, false) Cannot launch a preemptible instance with a capacity reservation id while creating the E3/E4 preemptible shapes.
- Solution: There is a known OKE bug with empty
<capacityreservationId>
and preemptible worker nodes. If you try to use preemptible capacity when creating a node pool through the Oracle Cloud Infrastructure Command Line Interface (OCI CLI) or terraform without passing the<capacityReservationId>
parameter as part of placement config section, preemptible worker nodes can be created. However, if you create the node pool through the console an empty<capacityReservationId>
is passed by default which will throw an error when trying to use preemptible shapes. Workaround is to explicitly set<capacityReservationId>
to null in the placement config.
- Solution: There is a known OKE bug with empty
-
Issue: When you use two node pools (one using preemptible, the other using on-demand) and leverage priority-based expanders for the auto-scaler. Configuration of preemptible node pool as highest priority, on-demand node pool set to be used if the preemptible node pool shows unhealthy due to an OutOfHostCapacity error with preemptible capacity. In some instance, the cluster auto-scaler does not fallback to the on-demand node pool even if the preemptible node pool is unhealthy if node pool has 0 node.
- Solution: This setup will work but only when both node pools have a minimum of 1 node each.
-
Issue: When selecting preemptible capacity in the OKE provisioning wizard through the console, the available AMD shapes are limited to E3 and E4. E5 is not shown there; however, E5 preemptible capacity is supported for OKE.
-
Solution: Create managed node pools with preemptible E5 shape by using the API or CLI.
--node-shape VM.Standard.E5.Flex
-
Task 1: Steps to Create E5 Preemptible OKE Worker Nodepool using Command Line Interface (CLI)
-
Log in to the OCI Console and click the services menu.
-
Navigate to Developer Services.
-
Under Containers & Artifacts, click Click Kubernetes Engine (OKE).
-
Click OCI Cloud shell to display the CLI.
-
Edit
node-pool
OCID,compartment
OCID,subnet-id
,fault domain
, configuration and size before running the following CLI commands in your tenancy.oci ce node-pool create --cluster-id ocid1.cluster.oc1.iad.aaaaaaaaxlokvt2r25b6dmdxxxxxxxxxxxxxxxxxkhdilj7kpehc5vke2ve5gq --compartment-id ocid1.compartment.oc1..aaaaaaaaqufgrkgzr4zb3dxxxxxxxxxxxxxxxxxxp7jx7yckglghxppfrui6a --name E5_Preemtible --node-shape VM.Standard.E5.Flex --placement-configs '[{"availabilityDomain": "FZyT:US-ASHBURN-AD-2", "preemptibleNodeConfig": {"preemptionAction":{"isPreserveBootVolume":true, "type": "TERMINATE"}}, "subnet-id": "ocid1.subnet.oc1.iad.aaaaaaaapmekowq4rqhu72xxxxxxxxxxxxxxxxxxxxtlkp4dmixebzhgrwdlmtteclq", "faultDomains":["FAULT-DOMAIN-1"]}]' --size 1 --node-image-id ocid1.image.oc1.iad.aaaaaaaajvtta4i5sq4xxxxxxxxxxxxxcskfxjwz4vwxz6ersmmax6q --node-shape-config '{"memoryInGBs": 6.0, "ocpus": 1.0}' --pod-subnet-ids '["ocid1.subnet.oc1.iad.aaaaaaaapmekowq4rqhxxxxxxxxxxxxxxxkp4dmixebzhgrwdlmtteclq"]'
This will output OCID of the work request for creating the node pool and create E5 preemptible worker node in the existing cluster as shown in the following image.
Related Links
Acknowledgments
- Authors - Payal Sharma (Senior Enterprise Cloud Architect), Anthony Vernava IV (Senior Enterprise Cloud Architect)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Manage OCI Kubernetes Engine with Different Capacity Types and Resolve Common Issues on Preemptible Nodes
G19809-01
November 2024