Using Karpenter Provider for OCI (KPO)

Using Karpenter Provider for OCI (KPO) with Kubernetes Engine (OKE)

Karpenter is an open source Kubernetes node provisioning and scaling tool. You use Karpenter to add and remove worker nodes automatically based on scheduling demand. When pods can’t be scheduled, Karpenter provisions worker nodes that meet the pods’ requirements. You also configure disruption settings so Karpenter consolidates capacity and replaces worker nodes in a controlled way.

You can use Karpenter to:

  • Scale worker nodes based on workload demand.

  • Select instance types based on scheduling requirements, such as CPU, memory, architecture, and availability domain.

  • Control node lifecycle behavior, such as consolidation and disruption budgets.

  • Reduce the operational work of managing fixed-size node pools.

For more information about Karpenter, see the Karpenter documentation.

Karpenter Provider for OCI (KPO) integrates Karpenter with OCI Kubernetes Engine (OKE) so you can provision and scale worker nodes by using OCI compute instances. In summary, you configure OCI-specific worker node settings in OCINodeClass, then configure scheduling and scaling intent in the Karpenter NodePool.

In more detail, KPO uses the following Kubernetes custom resources to configure provisioning and scaling behavior. The resources are defined by custom resource definitions (CRDs) installed with Karpenter and KPO:

  • NodePool: The CRD is installed with Karpenter. You create NodePool resources to define scheduling requirements, disruption behavior, and scaling limits. A NodePool describes the kind of capacity that workloads can use, and it references an OCINodeClass to supply OCI-specific settings for that capacity.
  • OCINodeClass: The CRD is installed with Karpenter Provider for OCI. You create OCINodeClass resources to define OCI-specific worker node settings such as image selection, boot volume settings, and VNIC settings.
  • NodeClaim: The CRD is installed with Karpenter. Karpenter creates NodeClaim resources when pending pods need capacity, and each NodeClaim represents a request for a worker node.

When pods can’t be scheduled, provisioning follows this sequence:

  1. Karpenter selects a compatible NodePool and creates a NodeClaim.
  2. The NodeClaim includes a nodeClassRef that points to the OCINodeClass referenced by the NodePool.
  3. KPO reads the NodeClaim and the referenced OCINodeClass, then provisions OCI resources such as the compute instance, VNICs, and the boot volume.
  4. The instance boots and joins the cluster as a worker node.

KPO integrates with the standard Karpenter cloud provider interface (aligned with upstream Karpenter version 1.6.2). Configure OCI-specific behavior through OCINodeClass, including:

  • OCI compute shapes (VM and bare metal) and corresponding scheduling labels.

  • Multiple sizing configurations for flexible shapes.

  • Burstable configuration for flexible shapes (baselineOcpuUtilization).

  • Preemptible instances (mapped from Karpenter spot capacity type).

  • OKE prebuilt images (Oracle Linux and Ubuntu).

  • Boot volume customization (KMS key, size, VPUs per GB, PV encryption in transit).

  • IPv4 and IPv6 addressing options on worker node VNICs.

  • Network security group (NSG) support on worker node VNICs.

  • Secondary VNIC configuration (only clusters using the OCI VCN IP Native CNI add-on).

  • Capacity reservations, cluster placement groups, and compute clusters.

  • Instance configuration (such as metadata, tags, launch options, SSH authorized keys).

  • Kubelet configuration overrides.

  • Image selection by filter to auto-detect a compatible OKE image.

  • Drift detection so Karpenter can replace nodes that drift from the desired configuration.

  • Optional node repair policies (requires Karpenter node repair to be enabled).

Complete the following tasks to set up KPO, configure IAM, and start provisioning worker nodes with Karpenter:

  1. Install KPO in the cluster's data plane (see Installing Karpenter Provider for OCI).

  2. Grant IAM permissions so that KPO can manage required OCI resources (see Granting IAM Permissions to Karpenter Provider for OCI).

  3. Grant IAM permissions to enable KPO-launched instances to join the cluster (see Enabling Node Registration for KPO-Launched Worker Nodes).

  4. Create one or more OCINodeClass resources (see Creating OCINodeClass Resources).

  5. Create one or more Karpenter NodePool resources that reference the OCINodeClass resources (see Creating NodePool Resources that Reference OCINodeClass Resources).

Installing Karpenter Provider for OCI

Prerequisites

Before you install KPO, confirm that the cluster and networking configuration support Karpenter-provisioned worker nodes. Note that it's possible for installation to succeed, but for subsequent worker node provisioning to fail due to missing prerequisites.

  • Kubernetes version running on cluster: Kubernetes version 1.31 or later.

  • Worker node capacity: Run at least one existing managed node pool or self-managed worker nodes so KPO can run in the data plane.

  • OCI VCN-Native Pod Networking CNI plugin cluster add-on (also known as the OCI VCN IP Native CNI add-on) version: If the cluster is using the OCI VCN IP Native CNI add-on, the add-on must be version 3.0.0 or later. We strongly recommend the use of version 3.2.0 or later. If the cluster is using a version of the add-on prior to version 3.2.0, secondary VNICs support a maximum of 16 IP addresses (ipCount must not exceed 16).

  • Helm client installed: Install the Helm client in the environment where you want to run Helm commands.

Download the Helm Chart

Download the KPO Helm chart tarball from your approved distribution location, then review supported configuration values before you deploy KPO.

To download KPO, go to https://github.com/oracle/karpenter-provider-oci/releases.

To review supported chart values:

helm show values <path-to-kpo-chart.tgz>

Review and Configure Helm Values

Review the Helm configuration values (often simply referred to as Helm values) in the downloaded chart tarball by entering:

helm show values <path-to-kpo-chart.tgz>

The following table shows the most important Helm values:

Value Required Description
settings.clusterCompartmentId Yes Default compartment OCID used to resolve OCI resources referenced in OCINodeClass. Use the cluster's compartment OCID in most cases.
settings.vcnCompartmentId Yes Default compartment OCID used to resolve network resources referenced in OCINodeClass. Use the cluster's VCN compartment OCID in most cases.
settings.apiserverEndpoint Yes

API server endpoint (privateIP) that worker nodes use to communicate with the Kubernetes API server. For example, 10.0.0.14.

settings.ociVcnIpNative No Set to true if your cluster uses OCI VCN IP Native CNI add-on. Accepted values: true or false.
settings.ipFamilies No IP families allocated to a worker node VNIC. Default: ["IPv4"]. Accepted values: "IPv4", "IPv6", or both. Ensure the referenced subnets include matching CIDR blocks.
settings.flexibleShapeConfigs No Default sizing configurations for flexible shapes. Each entry can specify ocpu, memoryInGbs, and baselineOcpuUtilization. Define this in YAML format (the chart converts YAML to JSON).
settings.repairPolicies No A list of Karpenter node repair policies (for more information, see Karpenter cloudprovider.RepairPolicy ). Each entry specifies a node condition type and status plus a duration threshold before Karpenter repairs the node. Define this in YAML format (the chart converts YAML to JSON). Requires the Karpenter node repair feature to be enabled.

For example:

repairPolicies:
  - conditionType: Ready
    conditionStatus: "False"
    tolerationDuration: 600000000000

where tolerationDuration is specified in nanoseconds:

image.registry No The registry domain for Container Registry in the region in which the cluster is located. Default: phx.ocir.io. For more information, see Availability by Region.

You must set the required Helm values to enable KPO to resolve OCI resources and bootstrap worker nodes. You can also set the optional Helm values to match your networking model and your worker node strategy.

You set Helm values in a configuration file (referred to as the Helm values file).

Install KPO

Deploy the KPO controller into the OKE data plane. Install KPO into a namespace you can manage and monitor with your cluster operations tooling.

  1. Choose the namespace where you want to deploy KPO.

  2. Create a Helm values file in YAML format that includes required values and any optional values you want to override. For example:

    settings:
      clusterCompartmentId: "<cluster-compartment-id>"
      vcnCompartmentId: "<cluster-vcn-compartment-id>"
      ociVcnIpNative: false
      apiserverEndpoint: 10.0.0.14
    
    image:
      registry: "phx.ocir.io"
  3. Install the chart by entering:

    helm install karpenter <path-to-chart-tarball> \
      --values <path-to-helm-values-file> \
      --namespace <karpenter-namespace> \
      --create-namespace

    If you do not specify a namespace in which to deploy KPO, the default namespace is used.

For more information about installing KPO, see the KPO installation documentation on Github.

Verify the Installation

Having installed KPO, confirm that the KPO controller starts successfully before you create OCINodeClass and NodePool resources.

  1. Confirm that the pods are running by entering:

    kubectl get pods --namespace <karpenter-namespace>
  2. If the pods are not running, inspect pod events and logs by entering:

    kubectl describe pod --namespace <karpenter-namespace> <pod-name>
    kubectl logs --namespace <karpenter-namespace> <pod-name>

Granting IAM Permissions to Karpenter Provider for OCI

To enable KPO to create and manage the OCI resources required for worker nodes, you must grant the KPO workload the necessary IAM permissions. Use the policy pattern shown in Workload Identity Policy Pattern to add the required policy statements shown in Basic Policies Required for KPO Operation. Add additional statements for specific features that you enable in OCINodeClass, as shown in Optional Policies (Feature-Specific).

Workload Identity Policy Pattern

Use the following pattern to specify workload identity conditions in IAM policies so OCI permissions apply only to the KPO workload in the cluster:

Allow any-user to <verb> <resource> in <location> where all {
  request.principal.type = 'workload',
  request.principal.namespace = '<namespace-name>',
  request.principal.service_account = '<service-account-name>',
  request.principal.cluster_id = '<cluster-ocid>'
}
where:
  • <verb> is the action to permit (varies by OCI resource).
  • <location> is the name of the resource's compartment, or specify tenancy for all compartments.
  • <namespace-name> is the Kubernetes namespace where KPO is deployed (by default, KPO is deployed in the default namespace).
  • <service-account-name> is the service account used by KPO pods (by default, the karpenter service account is used by KPO pods).
  • <cluster-ocid> is the OCID of the cluster.

Basic Policies Required for KPO Operation

Grant permissions to KPO to manage the compute, storage, and networking resources required for worker nodes by including policy statements similar to the following in an IAM policy (using the pattern shown in Workload Identity Policy Pattern):

Allow any-user to manage instance-family in compartment <compartment-name> where all { ... }
Allow any-user to manage volumes in compartment <compartment-name> where all { ... }
Allow any-user to manage volume-attachments in compartment <compartment-name> where all { ... }
Allow any-user to manage virtual-network-family in compartment <compartment-name> where all { ... }
Allow any-user to inspect compartments in compartment <compartment-name> where all { ... }

Note that if you enable optional features in OCINodeClass, you also have to define additional policies (see Optional Policies (Feature-Specific)).

Optional Policies (Feature-Specific)

In addition to the policy statements shown in Basic Policies Required for KPO Operation, if you enable optional features in OCINodeClass, you also have to define additional policies.

Feature Example policy statements
Capacity reservation Allow any-user to use compute-capacity-reservations in compartment <compartment-name> where all { ... }
Compute cluster Allow any-user to use compute-clusters in compartment <compartment-name> where all { ... }
Cluster placement group Allow any-user to use cluster-placement-groups in compartment <compartment-name> where all { ... }
Defined tags Allow any-user to use tag-namespaces in compartment <compartment-name> where all { ... }

Only add the policies required by the features you enable in OCINodeClass.

Enabling Node Registration for KPO-Launched Worker Nodes

Worker nodes launched by KPO must join the cluster. Grant CLUSTER_JOIN by using a dynamic group for instances. Create a dynamic group (with a rule that includes the compute instances to add to the cluster), and a policy for the dynamic group (with a policy statement to allow members of the dynamic group to join the cluster).

  1. Create a new dynamic group to contain the compute instances in the compartment where KPO launches nodes:

    1. Follow the instructions in To create a dynamic group in the IAM documentation, and give the new dynamic group a name (for example, kpo-nodes-dyn-grp).
    2. Enter a rule that includes the compute instances in the compartment, in the format:

      ALL {instance.compartment.id = '<compartment-ocid>'}

      where <compartment-ocid> is the OCID of the compartment where KPO launches nodes.

      For example:

      ALL {instance.compartment.id = 'ocid1.compartment.oc1..aaaaaaaa23______smwa'}
    3. Select Create.
  2. Create a policy for the dynamic group, with a policy statement to allow compute instances in the dynamic group to join the cluster:
    1. Follow the instructions in To create a policy in the IAM documentation, and give the new policy a name (for example, kpo-nodes-policy).
    2. Enter a policy statement to allow compute instances in the dynamic group to join the cluster, in the format:

      Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
      

      where:

      • <dynamic-group-name> is the name of the dynamic group you created earlier. For example, kpo-nodes-dyn-grp. Note that if a dynamic group is not in the default identity domain, prefix the dynamic group name with the identity domain name, in the format dynamic-group '<identity-domain-name>'/'<dynamic-group-name>'. You can also specify the dynamic group using its OCID, in the format dynamic-group id <dynamic-group-ocid>.
      • <compartment-name> is the name of the compartment to which the cluster belongs. For example, oke-cluster-compartment

      For example:

      Allow dynamic-group kpo-nodes-dyn-grp to {CLUSTER_JOIN} in compartment oke-cluster-compartment

      If you consider this policy statement to be too permissive, you can restrict the permissions to explicitly specify the cluster that you want worker nodes launched by KPO to join, by entering a policy statement in the format:

      Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
      where { target.cluster.id = "<cluster-ocid>" }
    3. Select Create to create the new policy.

Creating OCINodeClass Resources

Use OCINodeClass to define OCI infrastructure settings for Karpenter-provisioned worker nodes. KPO provisions the OCI compute and networking resources for those nodes. Reference an OCINodeClass from a Karpenter NodePool by using nodeClassRef.

For more information about each OCINodeClass field and status value, see OCINodeClass Reference.

To create an OCINodeClass resource:

  1. Create a YAML file containing an OCINodeClass manifest. For example:
    apiVersion: oci.oraclecloud.com/v1beta1
    kind: OCINodeClass
    metadata:
      name: my-ocinodeclass
    spec:
      shapeConfigs:
        - ocpus: 2  
          memoryInGbs: 8
        - ocpus: 4
          memoryInGbs: 16
      volumeConfig:
        bootVolumeConfig:
          imageConfig:
            imageType: OKEImage
            imageId: <OKE-Image-OCID>
      networkConfig:
        primaryVnicConfig:
          subnetConfig:
            subnetId: <Subnet-OCID>
  2. Apply the manifest by entering:
    kubectl apply -f <ocinodeclass-file>
  3. Check the status and conditions of the new OCINodeClass resource by entering:
    kubectl describe ocinodeclass <name>

Creating NodePool Resources that Reference OCINodeClass Resources

Create one or more Karpenter NodePool resources to define the worker node capacity that Karpenter can provision, including instance shape requirements, taints, disruption settings, and scaling limits.

In each NodePool, set spec.template.spec.nodeClassRef to reference an OCINodeClass so Karpenter Provider for OCI can apply the OCI-specific settings (such as image selection, boot volume settings, and VNIC settings) when it provisions nodes. Create separate NodePool resources when workloads require different scheduling rules or different OCINodeClass configurations.

To create a NodePool resource:

  1. Create a YAML file containing a NodePool manifest that references an OCINodeClass by using nodeClassRef. For example:

    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
      name: my-nodepool
    spec:
      template:
        spec:
          expireAfter: Never
          nodeClassRef:
            group: oci.oraclecloud.com
            kind: OCINodeClass
            name: my-ocinodeclass
          requirements:
            - key: karpenter.sh/capacity-type
              operator: In
              values:
                - on-demand
            - key: oci.oraclecloud.com/instance-shape
              operator: In
              values:
                - VM.Standard.E5.Flex
          terminationGracePeriod: 120m
      disruption:
        budgets:
          - nodes: 5%
        consolidateAfter: 60m
        consolidationPolicy: WhenEmpty
      limits:
        cpu: 64
        memory: 256Gi
  2. Apply the manifest by entering:
    kubectl apply -f <nodepool-file>
  3. Confirm that the NodePool exists by entering:
    kubectl get nodepools

OCINodeClass Reference

Use OCINodeClass to define OCI infrastructure settings that KPO uses when it provisions OCI resources for Karpenter-provisioned worker nodes. Reference an OCINodeClass from a Karpenter NodePool by using nodeClassRef.

OCINodeClassSpec

Field Description Required Example / Notes
volumeConfigBoot volume configurationRequiredSee VolumeConfig.
networkConfigVNIC subnet and optional NSGs for compute instanceRequiredSee NetworkConfig.
shapeConfigsAdditional shape configs for flexible and burstable shapes. Omitting this excludes flexible shapes from scheduling.OptionalSee ShapeConfig.
nodeCompartmentIdLaunch instance in a different compartment from the clusterOptionalCompartment OCID.
capacityReservationConfigsArray of capacity reservationsOptionalSee CapacityReservationConfig.
clusterPlacementGroupConfigsArray of cluster placement groups (CPGs). Only one CPG is allowed at most per availability domainOptionalSee ClusterPlacementGroupConfig.
computeClusterConfigCompute cluster configuration. Immutable after creationOptionalSee ComputeClusterConfig.
metadataUser data (key/value) for compute instanceOptional{ "foo": "bar" }
freeformTagsFreeform tags to apply to the instanceOptional{ "env": "prod" }
definedTagsDefined tags to apply to the instanceOptional{ "Department": { "CostCenter": "42" } }
kubeletConfigKubelet overridesOptionalSee KubeletConfig.
postBootstrapInitScriptBase64 script to run after OKE bootstrapOptionalBase64 shell script.
preBootstrapInitScriptBase64 script to run before OKE bootstrapOptionalBase64 shell script.
sshAuthorizedKeysList of authorized SSH public keysOptional[ "<ssh-public-key>" ]
launchOptionsLaunch options passed into compute instanceOptionalSee LaunchOptions.

VolumeConfig

Use volumeConfig to control how KPO builds the boot volume for each worker node.

FieldDescriptionRequiredExample / Notes
bootVolumeConfigBoot volume configurationRequiredSee BootVolumeConfig.

BootVolumeConfig

Use bootVolumeConfig to select the image and to configure boot volume sizing and performance.

FieldDescriptionRequiredExample / Notes
imageConfigReference to image(s) via OCID or filterRequiredSee ImageConfig.
sizeInGBsSize in GB (min 50)Optional50
vpusPerGBVolume performance units (VPUs) per GBOptional20
kmsKeyConfigReference to a KMS key for encryptionOptionalSee KmsKeyConfig.
pvEncryptionInTransitEnable PV encryption in transit. Accepted values: true or false. Default: falseOptionaltrue

ImageConfig

Use imageConfig to select the OKE image for worker nodes. Select an image by OCID or by filter. For more information about OKE images, see OKE Images.

FieldDescriptionRequiredExample / Notes
imageTypeType of imageRequiredAccepted: OKEImage.
imageFilterFilter for selecting imageRequired if imageId emptySee ImageSelectorTerm.
imageIdImage OCIDRequired if imageFilter emptyocid1.image.oc1..xxxx

KmsKeyConfig

Use kmsKeyConfig to encrypt boot volumes with your own key.

FieldDescriptionRequiredExample / Notes
kmsKeyIdKMS key OCIDOptionalocid1.key.oc1..xxxx

ImageSelectorTerm

Use ImageSelectorTerm to select an image by OS, version, compartment, and tags.

FieldDescriptionRequiredExample / Notes
osFilterOS name filterOptionalOracle Linux
osVersionFilterOS version filterOptional8
compartmentIdImage compartment OCIDOptionalocid1.compartment...
freeformTagsMatch freeform tagsOptional{ "key": "val" }
definedTagsMatch defined tagsOptional{ "namespace": { "key": "val" } }

NetworkConfig

Use networkConfig to place worker node VNICs in the correct subnets and attach NSGs when needed.

FieldDescriptionRequiredExample / Notes
primaryVnicConfigPrimary VNIC subnet and NSGsRequiredSee PrimaryVnicConfig.
secondaryVnicConfigsSecondary VNIC configsOptionalSee SecondaryVnicConfigs.

PrimaryVnicConfig

Use primaryVnicConfig to define the primary subnet and optional NSGs for the worker node.

FieldDescriptionRequiredExample / Notes
subnetConfigSubnet configurationRequiredSee SubnetConfig.
networkSecurityGroupConfigsNSG configurationsOptionalSee NetworkSecurityGroupConfigs.
assignIpV6IpAssign IPv6 IP addressOptionalfalse
assignPublicIpAssign public IP addressOptionalfalse
vnicDisplaynameVNIC display nameOptionalmy-vnic
ipv6AddressIpv6SubnetCidrPairDetailsIPv6 subnet-CIDR and address pairsOptionalSee Ipv6AddressIpv6SubnetCidrPairDetails.
skipSourceDestCheckSkip source/destination checkOptionalfalse
securityAttributesSecurity attributes mapOptional{ "s": "v" }

SubnetConfig

Select a subnet by OCID or by selector.

FieldDescriptionRequiredExample / Notes
subnetIdSubnet OCIDRequired if subnetFilter emptyocid1.subnet...
subnetFilterSubnet selectorRequired if subnetId emptySee OciResourceSelectorTerm.

NetworkSecurityGroupConfigs

Select an NSG by OCID or by selector.

FieldDescriptionRequiredExample / Notes
networkSecurityGroupIdNSG OCIDRequired if networkSecurityGroupFilter emptyocid1.networksecuritygroup...
networkSecurityGroupFilterNSG selectorRequired if networkSecurityGroupId emptySee OciResourceSelectorTerm.

Ipv6AddressIpv6SubnetCidrPairDetails

Use this field when you need to control IPv6 subnet CIDR assignment.

FieldDescriptionRequiredExample / Notes
ipv6SubnetCidrIPv6 subnet CIDROptional2001:0db8::/64

OciResourceSelectorTerm

Use selector fields when you want to choose a subnet or NSG by name or by tags, instead of by OCID.

FieldDescriptionRequiredExample / Notes
compartmentIdResource compartmentOptionalocid1.compartment...
displayNameMatch display nameOptionalmysubnet
freeformTagsMatch freeform tagsOptional{ "key": "val" }
definedTagsMatch defined tagsOptional{ "namespace": { "key": "val" } }

SecondaryVnicConfigs

Use secondaryVnicConfigs when the cluster uses the OCI VCN IP Native CNI add-on and you need a secondary VNIC for pod IP addressing. Has all of the PrimaryVnicConfig fields, and some additional fields.

FieldDescriptionRequiredExample / Notes
(All PrimaryVnicConfig fields)(Same as PrimaryVnicConfig)(Varies)(not applicable)
applicationResourceApplication identifierOptionalblue
ipCountMax IPs per VNIC (power-of-2 values between 1 and 16, so allowed values are 1, 2, 4, 8, or 16). See How do you configure secondary VNICs?Optional8
nicIndexNIC slot index for hosts with multiple cardsOptional0

ShapeConfigs

Use shapeConfigs to define sizing for flexible shapes and burstable configurations.

FieldDescriptionRequiredExample / Notes
ocpusOCPUs for flexible shapes (minimum of 1)Required4
baselineOcpuUtilizationUtilization ratio for burstable shapes. Accepted values: BASELINE_1_8 , BASELINE_1_2 , BASELINE_1_1OptionalBASELINE_1_8
memoryInGbsMemory for flexible shapes (GB)Optional16

CapacityReservationConfigs

Use capacityReservationConfigs when you want worker nodes to run on capacity reserved in OCI.

FieldDescriptionRequiredExample / Notes
capacityReservationIdReservation OCIDRequired if capacityReservationFilter emptyocid1.reservation...
capacityReservationFilterReservation filterRequired if capacityReservationId emptySee OciResourceSelectorTerm.

ClusterPlacementGroupConfigs

Use clusterPlacementGroupConfigs when you need worker nodes placed close together for low-latency networking.

FieldDescriptionRequiredExample / Notes
clusterPlacementGroupIdCluster placement group (CPG) OCIDRequired if clusterPlacementGroupFilter emptyocid1.cpg...
clusterPlacementGroupFilterFilter for placement groupRequired if clusterPlacementGroupId emptySee OciResourceSelectorTerm.

ComputeClusterConfig

Use computeClusterConfig when you want worker nodes to run in a compute cluster.

FieldDescriptionRequiredExample / Notes
computeClusterIdCompute cluster OCIDRequired if computeClusterFilter emptyocid1.cluster...
computeClusterFilterFilter for compute clusterRequired if computeClusterId emptySee OciResourceSelectorTerm.

KubeletConfig

Use kubeletConfig to override kubelet settings on Karpenter-provisioned worker nodes. Karpenter Provider for OCI (KPO) applies these settings during node provisioning.

FieldDescriptionRequiredExample / Notes
clusterDNSList of cluster DNS IPsOptional[ "10.0.0.10" ]
extraArgsKubelet extra argsOptional--fail-swap-on=false
nodeLabelsKubelet node labelsOptional{ "role": "worker" }
maxPodsMaximum pods per instance (minimum 1)Optional110
podsPerCorePods per CPU coreOptional20
systemReservedSystem resource reservationsOptional{ "cpu": "1" }
kubeReservedKubernetes component reservationsOptional{ "cpu": "1" }
evictionHardHard eviction thresholdsOptional{ "memory.available": "200Mi" }
evictionSoftSoft eviction thresholdsOptional{ "memory.available": "500Mi" }
evictionSoftGracePeriodGrace periods for soft evictionOptional{ "memory.available": "30s" }
evictionMaxPodGracePeriodMax pod graceful termination (seconds)Optional60
imageGCHighThresholdPercentHigh-water disk usage percentage that triggers image garbage collection (GC)Optional85
imageGCLowThresholdPercentLow-water disk usage percentage as target for image garbage collection (GC)Optional75

LaunchOptions

Use launchOptions to set instance launch options.

FieldDescriptionRequiredExample / Notes
bootVolumeTypeBoot volume type (ISCSI, SCSI, IDE, VFIO, PARAVIRTUALIZED)Optional"ISCSI"
remoteDataVolumeTypeRemote data volume typeOptional"PARAVIRTUALIZED"
firmwareFirmware (BIOS or UEFI_64)Optional"UEFI_64"
networkTypeNIC emulation (VFIO, E1000, PARAVIRTUALIZED)Optional"E1000"
consistentVolumeNamingEnabledConsistent volume naming featureOptionalfalse (example shows true).

OCINodeClassStatus

Use status fields to troubleshoot validation and resolved configuration.

FieldTypeDescription
Conditions[]status.ConditionConditions for readiness, image, and network. Might include capacityReservation, clusterPlacementGroup, computeCluster
VolumeVolumeVolume configuration and state
NetworkNetworkNetwork configuration and state
CapacityReservations []CapacityReservationCapacity reservation details, if present
ClusterPlacementGroups[]ClusterPlacementGroupCluster placement group details, if present
ComputeClusterComputeClusterCompute cluster details, if present

Scheduling, Labels, and Taints

Scheduling Labels

Use node labels in NodePool.spec.template.spec.requirements and in workload scheduling rules (for example, node selectors and node affinity). Use standard Kubernetes labels and OCI-specific labels. KPO adds OCI-specific labels to nodes it provisions.

Label Example value Description
topology.kubernetes.io/zoneUocm:PHX-AD-1Availability domain
node.kubernetes.io/instance-typeVM.Standard.B1.4OCI shape name.

For flexible shapes use a name in the following format:

shapename.<X>o.<Y>g.<burstableRatio>b

where <X>o is the number of CPUs, <Y>g is the memory in GB, and <burstableRatio>b is one of 1_1, 1_2, or 1_8. For example:

VM.Standard.E5.Flex.2o.32g.1_1b
kubernetes.io/oslinuxOS value as defined by Go GOOS values (KnownOS) on the instance
kubernetes.io/archamd64Architecture value as defined by Go GOARCH values (KnownArch) on the instance
karpenter.sh/capacity-typespotCapacity types include reserved, spot, and on-demand
oci.oraclecloud.com/instance-shape[ VM.Standard.E5.Flex, VM.Standard.E6.Flex ]OCI-specific label available on all shapes
oci.oraclecloud.com/gpu-shapetrueOCI-specific label available on all GPU shapes
oci.oraclecloud.com/baremetal-shapetrueOCI-specific label available on all bare metal shapes
oci.oraclecloud.com/denseio-shapetrueOCI-specific label available on all dense I/O shapes
oci.oraclecloud.com/flex-shapetrueOCI-specific label available on all flexible shapes
oci.oraclecloud.com/fault-domain FAULT-DOMAIN-1 OCI-specific. Fault domain within the selected availability domain. Use this label to constrain placement to a specific fault domain, such as FAULT-DOMAIN-1, FAULT-DOMAIN-2, or FAULT-DOMAIN-3
oci.oraclecloud.com/capacity-reservation-idlast segment of reservation OCID, including the last dotOCI-specific label for capacity reservation ID. To accommodate the 63 character limit of Kubernetes labels, remove all characters before the last dot of the capacity reservation OCID.

Required Taints and Tolerations

Use taints and tolerations to control where workloads run. Configure taints in the NodePool so Karpenter offers the corresponding capacity, then configure tolerations in workloads that must run on those worker nodes.

Be aware of the following special scenarios:

  • Preemptible (Spot) Worker Nodes:

    Worker nodes launched as spot capacity use the taint oci.oraclecloud.com/oke-is-preemptible.

    To use spot capacity:

    1. Add the taint to the NodePool so Karpenter offers spot shapes.

    2. Add the corresponding toleration to workloads that must run on preemptible nodes.

    NodePool taint example:

    template:
      spec:
        taints:
          - effect: NoSchedule
            key: oci.oraclecloud.com/oke-is-preemptible
            value: present

    If the NodePool does not include the taint oci.oraclecloud.com/oke-is-preemptible, Karpenter does not offer preemptible shapes.

  • GPU Worker Nodes:

    Worker nodes launched with GPU shapes use vendor-specific taints:

    • NVIDIA GPU shapes: nvidia.com/gpu

    • AMD GPU shapes: amd.com/gpu

    To use GPU shapes:

    1. Add the appropriate taint to the NodePool so Karpenter offers GPU shapes.

    2. Add the corresponding toleration to GPU workloads.

    NodePool taint example (NVIDIA):

    template:
      spec:
        taints:
          - effect: NoSchedule
            key: nvidia.com/gpu
            value: present

    If the NodePool does not include the appropriate taint, Karpenter does not offer the corresponding shapes.

Validating Node Provisioning

Use Karpenter and KPO custom resources to confirm that Karpenter is creating NodeClaim resources and that worker nodes are joining the cluster.

  1. Confirm that NodePool and OCINodeClass resources exist by entering:
    kubectl get nodepools
    kubectl get ocinodeclasses
  2. Confirm that Karpenter is creating NodeClaim resources by entering:
    kubectl get nodeclaims
    kubectl describe nodeclaim <nodeclaim-name>
  3. Confirm that nodes are created and labeled with the name of the node pool they belong to.
    • List all nodes in the cluster and show the value of the karpenter.sh/nodepool label for each node by entering:
      kubectl get nodes -L karpenter.sh/nodepool
    • List only nodes that have the karpenter.sh/nodepool label by entering:
      kubectl get nodes -l karpenter.sh/nodepool
  4. If nodes do not join the cluster, review the KPO controller pods and logs by entering:
    kubectl get pods --namespace <karpenter-namespace>
    kubectl logs --namespace <karpenter-namespace> <pod-name>

NodePool and OCINodeClass Examples

Use the following examples to create NodePool and OCINodeClass resources for common worker node provisioning scenarios. Each example shows how to combine Karpenter scheduling intent in a NodePool with OCI-specific settings in an OCINodeClass:

For more information about mapping common OCI features to NodePool requirements or OCINodeClass fields, see Additional Use Cases.

Example 1: Flexible Shapes with an Explicit OKE Image OCID

Use this example when you want a NodePool that allows specific flexible shapes, and you want an OCINodeClass that defines sizing for those flexible shapes and selects an OKE image by OCID.

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: my-nodepool
spec:
  template:
    spec:
      expireAfter: Never
      nodeClassRef:
        group: oci.oraclecloud.com
        kind: OCINodeClass
        name: my-ocinodeclass
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: oci.oraclecloud.com/instance-shape #expand this list as needed
          operator: In
          values:
            - VM.Standard.E3.Flex
            - VM.Standard.E4.Flex
            - VM.Standard.E5.Flex
      terminationGracePeriod: 120m
  disruption:
    budgets:
      - nodes: 5%
    consolidateAfter: 60m
    consolidationPolicy: WhenEmpty
  limits:
    cpu: 64
    memory: 256Gi 
---
apiVersion: oci.oraclecloud.com/v1beta1
kind: OCINodeClass
metadata:
  name: my-ocinodeclass
spec:
  shapeConfigs:
    - ocpus: 2  
      memoryInGbs: 8
    - ocpus: 4
      memoryInGbs: 16
  volumeConfig:
    bootVolumeConfig:
      imageConfig:
        imageType: OKEImage
        imageId: <OKE-Image-OCID> 
  networkConfig:
    primaryVnicConfig:
      subnetConfig:
        subnetId: <Subnet-OCID> 

Example 2: Image Filter Selection with Drift Replacement

Use this example when you want KPO to use a filter to select an image automatically, and you want Karpenter to replace worker nodes when they drift from the selected image.

The image that is selected depends on the cluster's Kubernetes version and the available OKE images. When the cluster control plane is upgraded or new OKE images are released, the desired worker node image will also change. Nodes launched with an outdated image are considered to have "drifted". To minimize unexpected disruption during such events, we recommend that you configure an appropriate disruption budget in the Karpenter node pool, specifying reasons, disruption percentage, and schedule (for more information, see Disruption in the Karpenter documentation).

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: my-nodepool
spec:
  template:
    spec:
      expireAfter: Never
      nodeClassRef:
        group: oci.oraclecloud.com
        kind: OCINodeClass
        name: my-ocinodeclass
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: oci.oraclecloud.com/instance-shape #expand this list as needed
          operator: In
          values:
            - VM.Standard.E3.Flex
            - VM.Standard.E4.Flex
            - VM.Standard.E5.Flex
      terminationGracePeriod: 120m
  disruption:
    budgets:
      - nodes: 5%
        reasons: 
          - Drifted
        schedule: "@daily" #customize schedule for your own needs, following https://karpenter.sh/docs/concepts/disruption/#schedule 
        duration: 10m
    consolidateAfter: 60m
    consolidationPolicy: WhenEmpty
  limits:
    cpu: 64
    memory: 256Gi 
---
apiVersion: oci.oraclecloud.com/v1beta1
kind: OCINodeClass
metadata:
  name: my-ocinodeclass
spec:
  shapeConfigs:
    - ocpus: 2  
      memoryInGbs: 8
    - ocpus: 4
      memoryInGbs: 16
  volumeConfig:
    bootVolumeConfig:
      imageConfig:
        imageType: OKEImage
        imageFilter: 
          osFilter: "Oracle Linux"
          osVersionFilter: "8"
  networkConfig:
    primaryVnicConfig:
      subnetConfig:
        subnetId: <Subnet-OCID> 

Example 3: Secondary VNIC for OCI VCN IP Native CNI

Use this example when your cluster uses the OCI VCN IP Native CNI add-on and pods need IP addresses from a secondary VNIC subnet. Configure secondaryVnicConfigs to attach a secondary VNIC and allocate pod IP capacity.

---
apiVersion: oci.oraclecloud.com/v1beta1
kind: OCINodeClass
metadata:
  name: my-ocinodeclass
spec:
  shapeConfigs:
    - ocpus: 2  
      memoryInGbs: 8
    - ocpus: 4
      memoryInGbs: 16
  volumeConfig:
    bootVolumeConfig:
      imageConfig:
        imageType: OKEImage
        imageFilter: 
          osFilter: "Oracle Linux"
          osVersionFilter: "8"  
  networkConfig:
    primaryVnicConfig:
      subnetConfig:
        subnetId: <Subnet-OCID> 
    secondaryVnicConfigs:
      - subnetConfig:
          subnetId: <Subnet-OCID>  #pod subnet
        ipCount: 16

Additional Use Cases

Use the following table to map common OCI features to NodePool requirements or OCINodeClass fields.

Use case What to configure Notes
Spot capacity type support In the NodePool, set karpenter.sh/capacity-type to spot:
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]

KPO maps the Karpenter spot capacity type to OCI preemptible instances. Configure the required preemptible taint in the NodePool and tolerations in workloads.

For more information about the support, benefits, and limitations of preemptible instances, see Preemptible Instances.

Reserved capacity type support In the OCINodeClass, configure capacityReservationConfigs:
capacityReservationConfigs:
  - capacityReservationId: "<capacity-reservation-ocid>"

KPO maps the Karpenter reserved capacity type to OCI capacity reservations.

An OCI capacity reservation is an availability domain-level resource that can be configured with multiple instance reservation configurations. Each configuration specifies a shape, optional shape settings, and capacity.

When an OCINodeClass is configured with a capacity reservation, KPO filters instance offerings to match reservation availability domain and reservation instance configurations.

For more information about capacity reservations, see Capacity Reservations.

Burstable instance support In the OCINodeClass.shapeConfigs, set baselineOcpuUtilization:
shapeConfigs:
  - ocpus: 2
    memoryInGbs: 16
    baselineOcpuUtilization: BASELINE_1_2

Burstable instances are supported on OCI flexible shapes.

An OCI burstable instance is a virtual machine (VM) instance that provides a baseline level of CPU performance with the ability to burst to a higher level to support occasional spikes in usage.

For more information about burstable instances, see Burstable Instances.

Cluster placement group support In the OCINodeClass, configure clusterPlacementGroupConfigs:
clusterPlacementGroupConfigs:
  - clusterPlacementGroupId: "<cluster-placement-group-ocid>"

An OCI cluster placement group enables you to create resources in close proximity to one another to support low-latency networking use cases. A cluster placement group is an availability domain-level resource, and only one cluster placement group per availability domain is allowed.

For more information about cluster placement groups, see Overview of Cluster Placement Groups.

Compute cluster support In the OCINodeClass, configure computeClusterConfig:
computeClusterConfig:
  computeClusterId: "<compute-cluster-ocid>"

A compute cluster is a group of high performance computing (HPC), GPU, or optimized instances that are connected with a high-bandwidth, ultra low-latency network.

Configure at most one compute cluster per OCINodeClass. Compute cluster configuration is immutable after creation.

For more information about compute clusters, see Compute Clusters.

Customize kubelet In the OCINodeClass, configure kubeletConfig. Use KubeletConfig fields to control node-level kubelet behavior. If a kubelet configuration field is not shown for KubeletConfig, consider using the extraArgs field.
Customize compute instance In the OCINodeClass, configure nodeCompartmentId, metadata, freeformTags, definedTags, sshAuthorizedKeys, and launchOptions. Use OCI instance fields to control metadata, tags, SSH access, and launch behavior.

For more information about how to configure each field, see Creating an Instance.

Resource Discovery and Cleanup

You can identify the resources that Karpenter Provider for OCI (KPO) creates. When you no longer need the capacity, you can remove those resources. Perform the following tasks:

  • Identify the worker nodes and NodeClaim resources that Karpenter creates so you can scope the capacity you are auditing. For more information, see Identify Karpenter-managed worker nodes.
  • Locate the corresponding OCI instances by using KPO-applied tags so you can verify which infrastructure is associated with that capacity. For more information, see Identify OCI instances (and related resources) created for Karpenter nodes.
  • When resources are no longer required and before you delete a cluster, remove Karpenter-managed capacity so instances and attached resources do not continue running. Confirm that termination and deletion complete successfully, and that no orphaned resources remain. For more information, see Clean up resources.

Identify Karpenter-managed worker nodes

Karpenter applies the karpenter.sh/nodepool label to the nodes it creates. To list the nodes created by Karpenter, and the NodePool resource that each node belongs to, enter:

kubectl get nodes -L karpenter.sh/nodepool

To find out whether Karpenter is creating NodeClaim resources, or whether NodeClaim resources are blocked for some reason (for example, failing to launch, failing to register, or waiting on capacity), enter:

kubectl get nodeclaims
kubectl describe nodeclaim <nodeclaim-name>

Identify OCI instances (and related resources) created for Karpenter nodes

Use OCI search tools (such as the Resource Explorer) and the following OCI freeform tags to find instances and related resources that KPO created:

  • karpenterNodepool

  • orcl-containerengine/cluster-id (when applicable)

For more information about locating OCI resources, see Querying Resources

Clean up resources

Before you delete a cluster, remove Karpenter-managed capacity so that instances do not remain running after the cluster has been deleted.

  1. Delete Karpenter NodePool resources that create capacity by using kubectl delete and the NodePool name as follows:

    1. List NodePool resources by entering:
      kubectl get nodepools
    2. Delete a NodePool resource by entering:
      kubectl delete nodepool <nodepool-name>
    3. (Optional) Confirm that the NodePool resource is deleted by entering:
      kubectl get nodepools
  2. Confirm that NodeClaim resources and nodes are removed.

    1. Watch NodeClaim resources until they are deleted by entering:
      kubectl get nodeclaims --watch
    2. Watch nodes created for Karpenter node pools until they are removed:
      kubectl get nodes -l karpenter.sh/nodepool --watch
    3. If a NodeClaim does not terminate, check its status, conditions, and events by entering:
      kubectl describe nodeclaim <nodeclaim-name>
  3. Use the OCI Console, CLI, or SDK to confirm that OCI instances and attached resources are deleted.

    1. Locate the instances that KPO created using the karpenterNodepool freeform tag (and the orcl-containerengine/cluster-id tag, when applicable).
    2. Confirm that no such instances are present, or are in a terminated state and then disappear.
    3. Confirm that there are no remaining VNIC attachments for the instance, and that any VNICs that were created for the instance no longer exist.
    4. Confirm that there are no remaining volume attachments for the instance, and that the boot volume that was created for the instance is deleted (or is terminated and then removed).

    If any of the resources remain after the instance is terminated, treat them as potential orphaned resources and investigate whether:

    • the KPO controller is running and has IAM permissions to clean up resources
    • there are any deletion finalizers or errors in the KPO controller logs

Troubleshooting

What you can troubleshoot and where to look

When a workload can’t schedule or a worker node doesn’t join the cluster, check resources and log files, and metrics, and increase logging, as follows:

Check Kubernetes resources

Use kubectl commands to list key resources, as follows:

kubectl get nodepools
kubectl get ocinodeclasses
kubectl get nodeclaims
kubectl get nodes -l karpenter.sh/nodepool

Use kubectl commands to review details and conditions, as follows:

kubectl describe ocinodeclass <name>
kubectl describe nodeclaim <name>

Check KPO controller logs

Use kubectl commands to review the status of KPO pods and to review logs, as follows:

kubectl get pods --namespace <karpenter-namespace>
kubectl logs --namespace <karpenter-namespace> <pod-name>

Enable debug logging

Increase logging temporarily when you need more detail in KPO controller output, in one or both of the following ways:

Frequently Asked Questions (FAQs) about KPO

Do special taints added by OKE bootstrapping apply to Karpenter-managed nodes?

Yes. Some worker node types require specific taints. Configure the taints in the NodePool so Karpenter offers the corresponding capacity. Then configure tolerations in workloads that must run on those worker nodes.

For more information about special taints, see Required Taints and Tolerations.

How do you configure secondary VNICs?

When a cluster uses the OCI VCN IP Native CNI add-on, you can attach secondary VNICs to worker nodes so pods receive VCN-routable IP addresses from the pod subnet. Plan subnet capacity carefully so you don’t exhaust available addresses.

If the cluster is using the OCI VCN IP Native CNI add-on, the add-on must be version 3.0.0 or later. We strongly recommend the use of version 3.2.0 or later. In OCI VCN IP Native CNI add-on version 3.2.0 (and later), secondary VNICs support a maximum of 256 IP addresses. If the cluster is using a version of the add-on prior to version 3.2.0, secondary VNICs support a maximum of 16 IP addresses. If the cluster has to use a version of the add-on prior to version 3.2.0, you must explicitly set ipCount to a value no greater than 16.

In addition to the maximum number of supported IP addresses, secondary VNICs are subject to the following restrictions:

  • The number of assigned IP addresses must be a power of two, so set ipCount to a power of two.

  • For IPv6-only (single stack) secondary VNICs, only 1, 16, or 256 assigned IP addresses are supported, so set ipCount to 1, 16, or 256.

  • The aggregate total of all assigned IP addresses across all secondary VNICs within a node must not exceed 256.

  • If ipCount is not set for a secondary VNIC, it defaults to 32 for IPv4 clusters and IPv6 dual stack clusters, and to 256 for IPv6 single stack clusters.

Note the following recommendations and guidelines:

  • We recommend that you configure two CIDR blocks for the pod subnet used by secondary VNICs.
  • If the pod subnet used by secondary VNICs has a single CIDR block, make sure that the subnet has a sufficient number of contiguous IP addresses to accommodate the required number of IP assignments.

How do you schedule workloads on a specific Karpenter node pool?

To force workloads onto a specific Karpenter node pool, target the node pool label karpenter.sh/nodepool. Use either node affinity or a node selector, depending on how strict you want placement to be.

  • Node affinity example:

    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: karpenter.sh/nodepool
                  operator: In
                  values:
                    - <nodepool-name>

    For more information about node affinity, see Node affinity in the Kubernetes documentation.

  • Node selector example:

    nodeSelector:
      karpenter.sh/nodepool: <nodepool-name>

    For more information about node selectors, see nodeSelector in the Kubernetes documentation.

How do you spread workloads across availability domains and fault domains with Karpenter Provider for OCI?

Karpenter Provider for OCI (KPO) honors pod-level topologySpreadConstraints when the topologyKey matches a scheduling label that KPO supports. Typically, you'll use these topology keys most often in OCI:

  • topology.kubernetes.io/zone to spread across availability domains (ADs)
  • oci.oraclecloud.com/fault-domain to spread across fault domains (FDs) within an AD

To spread Karpenter-provisioned capacity, target the intended node pool by using the karpenter.sh/nodepool label, and ensure that the selected NodePool allows the ADs or FDs that you want the scheduler to use. If you set whenUnsatisfiable to DoNotSchedule, pods that can’t satisfy the spread constraint remain pending, which gives Karpenter an opportunity to provision nodes that meet the spread requirement:

We recommend the following practices:

  • Set replicas to a value equal to or greater than the number of topology domains you want to use.
  • Use AD spreading when the workload must remain available across multiple ADs.
  • Use FD spreading in a single-AD region.
  • Ensure that NodePool requirements include all topology values that topologySpreadConstraints uses.

Example: Configure a NodePool to allow three ADs

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: example-ad-nodepool
spec:
  template:
    spec:
      requirements:
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - <AVAILABILITY_DOMAIN_1>
        - <AVAILABILITY_DOMAIN_2>
        - <AVAILABILITY_DOMAIN_3>

Example: Spread a Deployment across those three ADs

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-ad-spread
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-ad-spread
  template:
    metadata:
      labels:
        app: example-ad-spread
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.sh/nodepool
                operator: In
                values:
                - example-ad-nodepool
      topologySpreadConstraints:
      - maxSkew: 1
        minDomains: 3
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: example-ad-spread
        matchLabelKeys:
        - pod-template-hash
        nodeAffinityPolicy: Honor
        nodeTaintsPolicy: Honor
      containers:
      - name: app
        image: registry.k8s.io/pause:3.9
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: "1"

Example: Configure a NodePool to allow three FDs in a single AD

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: example-fd-nodepool
spec:
  template:
    spec:
      requirements:
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - <AVAILABILITY_DOMAIN>
      - key: oci.oraclecloud.com/fault-domain
        operator: In
        values:
        - FAULT-DOMAIN-1
        - FAULT-DOMAIN-2
        - FAULT-DOMAIN-3

Example: Spread a Deployment across those three FDs

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-fd-spread
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-fd-spread
  template:
    metadata:
      labels:
        app: example-fd-spread
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.sh/nodepool
                operator: In
                values:
                - example-fd-nodepool
      topologySpreadConstraints:
      - maxSkew: 1
        minDomains: 3
        topologyKey: oci.oraclecloud.com/fault-domain
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: example-fd-spread
        matchLabelKeys:
        - pod-template-hash
        nodeAffinityPolicy: Honor
        nodeTaintsPolicy: Honor
      containers:
      - name: app
        image: registry.k8s.io/pause:3.9
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: "1"

Notes

  • Use whenUnsatisfiable: DoNotSchedule when you want Karpenter to provision capacity that satisfies the spread constraint.
  • Ensure that labelSelector matches the pod labels. If it does not match, the scheduler does not calculate skew for the intended pod set.
  • For more information about supported scheduling labels, see Scheduling, Labels, and Taints.

How do you list compatible OKE images for a cluster?

Use the OCI CLI to retrieve node pool options and filter the returned image sources. This approach enables you to identify image OCIDs that are compatible with the cluster Kubernetes version.

  1. Set environment variables that specify the region and cluster OCID, and that define filters for the Kubernetes version, OS major version, and any architectures or image types to exclude:

    REGION="<region>"
    CLUSTER_OCID="<cluster-ocid>"
    OKE_VERSION="<kubernetes-version>"
    OS_MAJOR="<os-major-version>"
    EXCLUDE_PATTERN="<architecture-pattern>"

    where:

    • <cluster-ocid> is the OCID of the cluster.
    • <kubernetes-version> is the Kubernetes minor version string to match in the image source name (for example, 1.31).
    • <os-major-version> is the OS major version to match in the image source name (for example, 8 for Oracle Linux 8). Set this to an empty string to avoid filtering by OS version.
    • <architecture-pattern> is a regular expression pattern used to exclude image source names (for example, aarch64|arm64|GPU to exclude ARM or GPU images). Set this to an empty string to avoid exclusions.

    For example:

    REGION="us-phoenix-1"
    CLUSTER_OCID="ocid1.cluster.oc1.phx.aaaaaaaa______w5q"
    OKE_VERSION="1.31"
    OS_MAJOR="8"
    EXCLUDE_PATTERN="aarch64|arm64|GPU"
  2. Run the following OCI CLI command to obtain the OCIDs of compatible images:

    oci ce node-pool-options get --region "${REGION}" --node-pool-option-id "${CLUSTER_OCID}" --output json | jq -r --arg ver "${OKE_VERSION:-}" --arg os "${OS_MAJOR:-}" --arg ex "${EXCLUDE_PATTERN:-}" '.data.sources[] | . as $src | ($src["source-name"] // "") as $name | select( ($ver == "" or ($name | test($ver))) and ($os == "" or ($name | test($os; "i"))) and ($ex == "" or ($name | test($ex; "i") | not)) ) | {id: $src["image-id"], source_name: $name}'

What to do if a flexible shape node pool does not provision, and you see: “skipping, nodepool requirements filtered out all instance types.”

Flexible shapes require a sizing configuration so that KPO can translate shape requirements into a concrete offering. Fix this by defining shapeConfigs in the referenced OCINodeClass or by defining defaults in the Helm values file under settings.flexibleShapeConfigs.

  • Example: Defining shapeConfigs in the referenced OCINodeClass:

    apiVersion: oci.oraclecloud.com/v1beta1
    kind: OCINodeClass
    metadata:
      name: example-nodeclass
    spec:
      shapeConfigs:
        ocpus: 2
        memoryInGbs: 16
        baselineOcpuUtilization: BASELINE_1_2
    ...
  • Example: Setting default shapeConfigs values globally in the Helm values file:

    settings:
      flexibleShapeConfigs:
        ocpus: 2
        memoryInGbs: 16

    You can override a default in the Helm values file by setting shapeConfigs in the OCINodeClass.

If you’re using a capacity reservation and facing this issue, confirm that flexibleShapeConfigs in the Helm values file (or shapeConfigs in the OCINodeClass, if present) matches the reservation exactly, with the same values for ocpus, memoryInGbs, and baselineOcpuUtilization.

How do you run the KPO controller with debug logging?

Increase log output to troubleshoot provisioning and OCI API calls in one or both of the following ways:

How do you pass a custom cloud-init script?

You can execute specific commands or configurations during a node's startup process by injecting custom cloud-init scripts using the OCINodeClass resource.

Thoroughly test custom cloud-init scripts before deploying them, to ensure the scripts execute as expected and do not interfere with the standard node initialization process.

There are two primary methods:

  • Option 1: Use preBootstrapInitScript and postBootstrapInitScript in OCINodeClass (recommended)

    You can run custom scripts before and after the default cloud-init script by using preBootstrapInitScript and postBootstrapInitScript in the OCINodeClass to specify the custom scripts, as follows:

    1. Prepare the custom cloud-init scripts:
      1. Create the scripts you want to run before and after the default node initialisation.
      2. Base64-encode each script.
    2. Add the base64-encoded scripts to the preBootstrapInitScript and/or postBootstrapInitScript fields in the OCINodeClass resource.

      For example:

      apiVersion: oci.oraclecloud.com/v1beta1
      kind: OCINodeClass
      metadata:
        name: example-nodeclass
      spec:
        ...
        preBootstrapInitScript: "IyEvYmluL2Jhc2gKZWNobyAiSSBhbSBhIHByZSBib290c3RyYXAgc2NyaXB0Ig=="
        postBootstrapInitScript: "IyEvYmluL2Jhc2gKZWNobyAiSSBhbSBhIHBvc3QgYm9vdHN0cmFwIHNjcmlwdCI="
  • Option 2: Specify a full base64-encoded cloud-init script in metadata.user_data (advanced workflow)

    You can provide a complete custom cloud-init script by setting the user_data field in the metadata section of the OCINodeClass. This method gives you full control of the initialization process, but requires careful management to ensure compatibility with Kubernetes Engine.

    1. Prepare the custom cloud-init script:

      1. Create the custom cloud-init script, and include both the custom configurations and the necessary commands that join the node to the cluster.

        The following example reads configuration values from the OCI Instance Metadata Service (IMDS), which is available only from within the instance, and then runs the commands that join the node to the cluster.

        #!/usr/bin/env bash
        set -o errexit
        set -o nounset
        set -o pipefail
        # OCI Instance Metadata Service (IMDS): link-local, reachable only from within the instance.
        # Do not use IMDS URLs with untrusted input (SSRF risk).
        MD_URL="http://169.254.169.254/opc/v2/instance/metadata"
        AUTH_HDR="Authorization: Bearer Oracle"
        
        # Fetch a metadata key, returning empty on error/missing
        fetch_md() {
          local key="$1"
          curl -sfL --noproxy '*' -H "${AUTH_HDR}" --connect-timeout 2 --max-time 5 "${MD_URL}/${key}" 2>/dev/null || true
        }
        
        CLUSTER_DNS="$(fetch_md kubedns_svc_ip)"
        KUBELET_EXTRA_ARGS="$(fetch_md kubelet-extra-args)"
        APISERVER_ENDPOINT="$(fetch_md apiserver_host)"
        KUBELET_CA_CERT="$(fetch_md cluster_ca_cert)"
        
        # Export only when present to avoid surprising consumers with empty values
        [ -n "${CLUSTER_DNS}" ] && export CLUSTER_DNS
        [ -n "${KUBELET_EXTRA_ARGS}" ] && export KUBELET_EXTRA_ARGS
        [ -n "${APISERVER_ENDPOINT}" ] && export APISERVER_ENDPOINT
        [ -n "${KUBELET_CA_CERT}" ] && export KUBELET_CA_CERT
        
        # BEGIN OF CUSTOM SCRIPT BOOTSTRAP SCRIPT , REPLACE THIS SECTION WITH CUSTOM PRE BOOTSTRAP SCRIPT
        #echo "pre bootstrap script"
        #echo "CLUSTER_DNS: ${CLUSTER_DNS:-}"
        #echo "KUBELET_EXTRA_ARGS: ${KUBELET_EXTRA_ARGS:-}"
        #echo "APISERVER_ENDPOINT: ${APISERVER_ENDPOINT:-}"
        #echo "KUBELET_CA_CERT: ${KUBELET_CA_CERT:-}"
        # END OF CUSTOM SCRIPT BOOTSTRAP SCRIPT
        
        bash /etc/oke/oke-install.sh
        
        # BEGIN OF POST BOOTSTRAP SCRIPT, IF NEEDED
        #echo "post bootstrap script"
        #END OF POST BOOTSTRAP SCRIPT

        If you use this example, only insert your custom logic:

        • between # BEGIN OF CUSTOM SCRIPT BOOTSTRAP SCRIPT , REPLACE THIS SECTION WITH CUSTOM PRE BOOTSTRAP SCRIPT and # END OF CUSTOM SCRIPT BOOTSTRAP SCRIPT

        • between # BEGIN OF POST BOOTSTRAP SCRIPT, IF NEEDED and #END OF POST BOOTSTRAP SCRIPT

        Note that the parameters necessary to bootstrap nodes are available in the Instance Metadata Service (IMDS). When creating a custom script, we strongly recommend setting the CLUSTER_DNS, KUBELET_EXTRA_ARGS, APISERVER_ENDPOINT, and KUBELET_CA_CERT environment variables by retrieving their values from IMDS, as shown in the example script. Configuring these variables as shown is essential for the correct operation of KPO.

      2. Base64-encode the script.
    2. Add the base64-encoded script to the user_data field in the metadata section of the OCINodeClass.

      For example:

      apiVersion: oci.oraclecloud.com/v1beta1
      kind: OCINodeClass
      metadata:
        name: example-nodeclass
      spec:
       metadata:
          user_data: "IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1vIGVycmV4aXQKc2V0IC1vIG5vdW5zZXQKc2V0IC1vIHBpcGVmYWlsCg=="
      ...

What should you consider for KubeletConfig maxPods and podsPerCore?

Use kubeletConfig to control the pod density of worker nodes, and keep it aligned with your networking capacity.

  • Set podsPerCore to a value that does not exceed maxPods.

  • For clusters that use the OCI VCN IP Native CNI add-on, set maxPods lower than the aggregate sum of ipCount values across secondary VNICs.

How do you configure the OKE prebuilt image compartment OCID in Karpenter Provider for OCI (KPO)?

When you want KPO to use a custom image based on an OKE image (by setting imageType: OKEImage), and the image is located in a different compartment, you can enable KPO to discover the image using the preBakedImageCompartmentId Helm value or the imageFilter of OCINodeClass.

If you use only Oracle-published OKE images or you reference images directly by OCID (imageId), this configuration is not necessary.

Prerequisites

  • You know the compartment OCID that contains the images you want KPO to discover.

  • A suitable policy exists to enable the KPO controller to read images in the different compartment. For example:
    Allow any-user to read instance-images in compartment <compartment-name> where all { ...workload identity conditions... }

Option 1: Configure the compartment for all OCINodeClass resources (using a Helm value)

  1. Set the preBakedImageCompartmentId Helm value:

    settings: 
      preBakedImageCompartmentId: "<compartment-ocid-containing-images>"
  2. Apply the updated configuration by running helm upgrade for the KPO deployment (see How do you change and reapply Helm values to update the KPO deployment?).

Option 2: Configure the compartment for a specific OCINodeClass (using imageFilter)

Set compartmentId in imageFilter:

volumeConfig:
  bootVolumeConfig:
    imageConfig:
      imageType: OKEImage
      imageFilter:
        compartmentId: "<compartment-ocid-containing-images>"
        osFilter: "Oracle Linux"
        osVersionFilter: "8"

Custom images based on OKE images (k8s_version requirement)

When you use a custom image based on an OKE image, ensure that the image has one or other of the following:

  • a k8s_version freeform tag
  • a BaseImageId value that points (directly or indirectly) to an ancestor OKE image that has the k8s_version tag

If neither is present, KPO cannot determine the Kubernetes version for the image, and image selection can fail with a missing k8s_version tag error.

How do you change and reapply Helm values to update the KPO deployment?

To change Helm values after installation, update the Helm values file, and then apply the new configuration.

  1. Update the Helm values file (for example, add or change logLevel: debug).

  2. Apply the updated configuration by entering:
    helm upgrade karpenter <path-to-chart-tarball> \
      --values <path-to-helm-values-file> \
      --namespace <karpenter-namespace>
  3. Confirm that the updated configuration is applied and that the KPO controller pods restart successfully:
    kubectl get pods --namespace <karpenter-namespace>