Container Engine Issues

This section describes known issues and workarounds related to the Oracle Container Engine for Kubernetes.

Container Engine for Kubernetes Requires Switch Firmware Upgrade on Systems with Administration Network

If your Private Cloud Appliance is configured with a separate administration network, the appliance and data center networking need reconfiguration to enable the traffic flows required by the Oracle Container Engine for Kubernetes (OKE). In addition, the reconfiguration of the network is dependent on functionality included in a new version of the switch software.

Workaround: Upgrade or patch the software of the switches in your appliance. Reconfigure the network. You can find details and instructions in the following documentation sections:

Bug: 36073167

Version: 3.0.2

Tag Filters Not Available for Kubernetes Node Pools and Nodes

Unlike Oracle Cloud Infrastructure, Private Cloud Appliance currently does not provide the functionality to use Tag Filters for tables listing Kubernetes node pools and nodes. Tag filtering is available for Kubernetes clusters.

Workaround: There is no workaround. The UI does not provide the tag filters in question.

Bug: 36091835

Version: 3.0.2

Kubernetes Node Tags Not Available in Compute Web UI

The Compute Web UI does not allow users to apply defined or freeform tags to all nodes in a Kubernetes node pool. However, tags can be applied to one node at a time from the UI. Tagging all nodes in a node pool at once must be done using the OCI CLI.

Workaround: To apply tags to all nodes in a node pool, use the OCI CLI command options --node-defined-tags and --node-freeform-tags.

Bug: 36156349

Version: 3.0.2

Node Doctor Script Not Available in Worker Nodes

In Oracle Cloud Infrastructure, the Oracle Container Engine for Kubernetes (OKE) provides a troubleshooting utility called Node Doctor. Its purpose is to help resolve problems with worker nodes that are not in Active state. You will see references to worker node troubleshooting and the Node Doctor script in the Compute Web UI. However, the functionality is not available in Private Cloud Appliance. Even if you install the script on your worker nodes, its operations will fail because of missing or unexpected environment data.

Workaround: There is no workaround. The Node Doctor script is not available on worker nodes in Private Cloud Appliance.

Bug: 35807245

Version: 3.0.2

Unable to Delete Kubernetes Cluster in Failed State

To deploy a Kubernetes cluster, the Oracle Container Engine for Kubernetes (OKE) uses various types of cloud resources that can also be managed through other infrastructure services, such as compute instances and load balancers. However, Kubernetes cluster resources must be manipulated only through the OKE Service, to avoid inconsistencies. If the network load balancer of a Kubernetes cluster is deleted outside the control of the OKE Service, that cluster ends up in a failed state and you will no longer be able to delete it.

Workaround: This is a known issue with the Cluster API Provider. If a cluster is in failed state and its network load balancer is no longer present, it must be cleaned up manually. Contact Oracle for assistance.

Bug: 36193835

Version: 3.0.2

Kubernetes Cluster Creation Failure Due to Load Balancer Limit Returning Unclear Error

When the maximum number of load balancers has already been deployed in your tenancy or appliance environment, a new Kubernetes cluster cannot be created. The cluster creation attempt results in a failure, but the error message returned does not state that the limit was reached. A more generic cluster reconciliation error message is returned instead.

# oci ce work-request-error list --compartment-id ocid1.tenancy...unique_id 
--work-request-id ocid1.workrequest...unique_id
  "data": [
    {
      "code": "GetWorkRequestGeneric",
      "message": "OCICluster reconciliation failed: ReconcileError",
      "timestamp": "2024-02-24T17:24:48.615203+00:00"
    }
  ]

Workaround: To confirm that the failure is caused by the load balancer limit, check the number of load balancers deployed in the appliance environment. If you have insufficient access rights, ask an administrator who has the necessary privileges.

Bug: 36335225

Version: 3.0.2

Intermittent Failures when Using Terraform to Create Kubernetes Cluster

When creating Kubernetes clusters from Terraform, there might be intermittent failures, which return a generic error message : "Failed to create cluster due to an Unknown error". These failures are known to occur when the Load Balancing service pods are not in sync. Testing shows that this issue is most likely to affect the first cluster creation attempt in the appliance environment.

Workaround: When running into this type of failure, users should delete the Kubernetes cluster that failed to deploy, and retry the operation to create the cluster. Particularly when the first cluster creation on the system fails, subsequent create operations tend to be successful.

Bug: 36379853

Version: 3.0.2

API Reference on Appliance Not Up-to-Date for OKE Service

Every Private Cloud Appliance provides online API reference pages, conveniently accessible from your browser. For the Compute Enclave, these pages are located at https://console.mypca.mycompany.com/api-reference. Unfortunately, the API reference in the appliance software version providing the initial release of the Oracle Container Engine for Kubernetes (OKE) has not been updated with the OKE Service APIs.

Workaround: Contact Oracle for assistance and open a service request.

Bug: 35710716

Version: 3.0.2