Kubernetes Engine Issues
This section describes known issues and workarounds related to Oracle Private Cloud Appliance Kubernetes Engine (OKE).
Supported OCI Terraform Provider Versions
The Oracle Private Cloud Appliance Kubernetes Engine (OKE) guide provides example Terraform scripts to configure OKE resources. To use these scripts, you must install both Terraform and the Oracle Cloud Infrastructure (OCI) Terraform provider.
If you use Terraform scripts with Kubernetes Engine (OKE), in
your provider block, specify the version of the OCI Terraform provider to
install as at least v4.50.0 but no greater than v6.36.0:
provider "oci" {
version = ">= 4.50.0, <= 6.36.0"
...
}Note: If you are using Terraform for node-pool create, use the following code:
"" is_force_delete_after_grace_duration = "true"
node-eviction_node_pool_settings {
is_force_delete_after_grace_duration = "true"
}Bug: 37934227
Version: 3.0.2
Enable Add-on Work Request Initially in Failed State
When you enable an add-on, the work request might initially show that the add-on installation failed instead of showing the add-on installation as pending. The add-on state should be Needs Attention. The add-on state should change to Active after reconciliation, and the work request state should change to Succeeded.
Workaround: Wait for the reconciliation process to run a couple of times. If the work request is still in Failed state and the add-on is still in Needs Attention state after a couple of reconciliation runs, then investigate as described in "Add-on Reconciliation" in the Managing OKE Cluster Add-ons chapter of the Oracle Private Cloud Appliance Kubernetes Engine (OKE) guide.
Bug: 37967658
Version: 3.0.2
Create Cluster Does Not Support Extension Parameters
In Private Cloud Appliance Release 3.0.2-b1185392, some cluster control plane node properties are specified by using OraclePCA defined tags.
In the previous release, Private Cloud Appliance Release 3.0.2-b1081557, these defined tags are not recognized. You must use free-form tags to specify these values.
Workaround: In Private Cloud Appliance Release 3.0.2-b1081557, use free-form tags to provide the following information for control plane nodes:
-
Your public SSH key.
Specify
sshkeyfor the tag key. Paste your public SSH key into the Value field.Important:
You cannot add an SSH key after the cluster is created.
-
Number of nodes.
By default, the number of nodes in the control plane is 3. You can specify 1, 3, or 5 nodes. To specify the number of control plane nodes, specify
cp_node_countfor the tag key, and enter 1, 3, or 5 in the Value field. -
Node shape.
For Private Cloud Appliance X10 systems, the shape of the control plane nodes is VM.PCAStandard.E5.Flex and you cannot change it. For all other Private Cloud Appliance systems, the default shape is VM.PCAStandard1.1, and you can specify a different shape.
To use a different shape, specify
cp_node_shapefor the tag key, and enter the name of the shape in the Value field. For a description of each shape, see Compute Shapes in the Oracle Private Cloud Appliance Concepts Guide. -
Node shape configuration.
If you specify a shape that is not a flexible shape, do not specify a shape configuration. The number of OCPUs and amount of memory are set to the values shown for this shape in "Standard Shapes" in Compute Shapes in the Oracle Private Cloud Appliance Concepts Guide.
If you specify a flexible shape, you can change the default shape configuration.
To provide shape configuration information, specify
cp_node_shape_configfor the tag key. You must specify the number of OCPUs (ocpus) you want. You can optionally specify the total amount of memory you want (memoryInGBs). The default value for gigabytes of memory is 16 times the number you specify for OCPUs.The following are examples of node shape configuration values. Enter everything, including the surrounding single quotation marks, in the Value field for the tag. In the first example, the default amount of memory will be configured.
'{"ocpus":1}' '{"ocpus":2, "memoryInGBs":24}'
Bug: 36979754
Version: 3.0.2
Nodes in Failing State After Upgrade or Patch
Upgrade or patch of an appliance that has OKE
clusters with node pools can cause some nodes to move into the FAILING state
even though the underlying compute instance is in the RUNNING state.
If you experience this issue, perform the following workaround.
Workaround: Use the following method to replace the failed nodes with new active nodes, automatically transferring workloads from the failed nodes to the new nodes.
Delete the nodes that are in state FAILING or FAILED. Do
not increase the size of the node pool (do not scale up the node pool).
The deleted nodes are cordoned and drained and their workloads are automatically transferred to the new nodes that are created to keep the node pool at the same size.
Bug: 36814183
Version: 3.0.2
OKE Requires Switch Firmware Upgrade on Systems with Administration Network
If your Private Cloud Appliance is configured with a separate administration network, the appliance and data center networking need reconfiguration to enable the traffic flows required by the Oracle Private Cloud Appliance Kubernetes Engine (OKE). In addition, the reconfiguration of the network is dependent on functionality included in a new version of the switch software.
Workaround: Upgrade or patch the software of the switches in your appliance. Reconfigure the network. You can find details and instructions in the following documentation sections:
-
"Upgrading the Switch Software" in the Oracle Private Cloud Appliance Upgrade Guide
-
"Patching the Switch Software" in the Oracle Private Cloud Appliance Patching Guide
-
"Securing the Network" in the Oracle Private Cloud Appliance Security Guide
This section includes a port matrix for systems with a separate administration network. Use it to configure routing and firewall rules, so the required traffic is enabled in a secure way.
Bug: 36073167
Version: 3.0.2
Previously Used Image Is No Longer Listed
The Compute Web UI and the compute image
list command list only the three most recently published versions of each major
distribution (for example, Oracle Linux 9) of an image. If
an upgrade or patch delivers an updated version of an OKE node image, for example the same image with a
newer Kubernetes version, and that major distribution image
had already been delivered three times, the fourth most recently published version of that
image will no longer be listed.
Previously delivered images are still accessible, even though they are not listed.
Workaround: To use an image that you have used before but is no
longer listed, use the OCI CLI to create the node
pool, and specify the OCID of the image. To get the OCID of the image you want, use the
ce node-pool get command for a node pool where you used this image
before.
Bug: 36862970
Version: 3.0.2
Tag Filters Not Available for Kubernetes Node Pools and Nodes
Unlike Oracle Cloud Infrastructure, Private Cloud Appliance currently does not provide the functionality to use Tag Filters for tables listing Kubernetes node pools and nodes. Tag filtering is available for Kubernetes clusters.
Workaround: There is no workaround. The UI does not provide the tag filters in question.
Bug: 36091835
Version: 3.0.2
OKE-Specific Tags Must Not Be Deleted
Certain properties and functions of OKE are enabled through resource tags. These reserved tags are not created by the IAM service, but by users who apply them to resources. Therefore, the IAM service cannot prevent users from deleting such tags. If they are deleted, the OKE service might not work as expected.
Workaround: Do not attempt to delete the resource tags used for specific OKE service functionality. If you delete these tags, you must create them again.
Bug: 37157933
Version: 3.0.2
Unable to Delete an OKE Cluster in Failed State
To deploy a cluster, Oracle Private Cloud Appliance Kubernetes Engine (OKE) uses various types of cloud resources that can also be managed through other infrastructure services, such as compute instances and load balancers. However, OKE cluster resources must be manipulated only through the OKE service, to avoid inconsistencies. If the network load balancer of an OKE cluster is deleted outside the control of the OKE service, that cluster ends up in a failed state and you will no longer be able to delete it.
Workaround: This is a known issue with the Cluster API Provider. If a cluster is in failed state and its network load balancer is no longer present, it must be cleaned up manually. Contact Oracle for assistance.
Bug: 36193835
Version: 3.0.2
UI and CLI Represent Eviction Grace Period Differently
The minimum and default grace period before a node is evicted from a worker node pool is 20
seconds. The OCI CLI displays this value
accurately and allows you to modify the grace period in seconds or minutes, using the ISO8601
format. For example, you could change the default of 20 seconds (="PT20S") to 3 minutes
(="PT3M") by specifying a new value in the --node-eviction-node-pool-settings
command parameter.
In contrast, the Compute Web UI parses the ISO8601 time format into an integer value and displays the eviction grace period in minutes. As a result, the 20 second default appears as 0 minutes in the Node Pool Information tab of the Kubernetes Cluster detail page.
This behavior differs from the Oracle Cloud Infrastructure console (UI), which is capable of displaying time in minutes as a decimal value (for example: 0.35 minutes). It has no minimum grace period, so zero is a valid entry.
Workaround: To set or check the precise eviction grace period of a node pool, use the OCI CLI and specify time in the ISO8601 format. When using the Compute Web UI, consider the limitations described.
Bug: 36696595
Version: 3.0.2
Nodes in Node Pool Not Automatically Distributed Across Fault Domains
When you create an OKE node pool without selecting specific fault domains, the Compute service handles distribution of the nodes across the fault domains. By design, node pool nodes (and compute instances in general) are assigned to the compute nodes with the highest available resource capacity. Due to VM activity and differences in resource consumption, the load between the three fault domains might vary considerably. Therefore, the auto-distribution logic cannot guarantee that nodes of the same node pool are spread evenly across fault domains. In fact, all nodes might end up in the same fault domain, which is not preferred.
Workaround: For the best distribution of node pool nodes across fault domains, do not rely on auto-distribution. Instead, select the fault domains to use when creating the node pool.
Bug: 36901742
Version: 3.0.2
API Reference on Appliance Not Up-to-Date for OKE Service
Every Private Cloud Appliance provides online API reference pages, conveniently accessible from your browser. For the Compute Enclave, these pages are located at https://console.mypca.mydomain/api-reference. This API reference is not current for all services, including for Oracle Private Cloud Appliance Kubernetes Engine (OKE).
Workaround: The REST API for Oracle Private Cloud Appliance Compute Enclave shows up-to-date parameters and values in the descriptions of the CreateCluster and CreateNodePool operations.
Note:
Both the console api-reference and the Oracle Help Center REST API for Oracle Private Cloud Appliance Compute Enclave show parameters and parameter values
that are not supported because they do not apply to Private Cloud Appliance. If you use these, you might receive
a not supported error message, or the parameter or value will accepted by the API but will
do nothing.
Bug: 35710716, 36852746
Version: 3.0.2
OKE Cluster Creation Fails
OKE cluster creation might fail if the system is configured with a domain name that contains uppercase characters. Uppercase characters are not supported in domain names.
Workaround: Contact Oracle Support.
Bug: 36611385
Version: 3.0.2
Backend Gets Removed From Load Balancer After Node Cycling of Node Pool
The backend of the load balancer that exposes the underlying application gets detached from
the existing load balancer when its node pool is been updated and node cycled with
maximumUnavailable as 1. This happens only when
maximumUnavailable is set to 1 and the number of nodes in the node pool is
just 1. Though after node cycling the new node rejoins the cluster and node cycling is
successful, the new node is not associated as a backend to the service load balancer.
Workaround: Delete the service object and recreate or use
maximumUnavailable as 0 when node cycling if the cluster has only one node
pool with 1 node.
Bug: 38484329
Version: 3.0.2
Upgrade from 3.0.2-b1392231 to 3.0.2-b1483396 OKE Cluster Created intially at M3.11.1 Fails Upgrade to OKE 1.31.6 Post Upgrade
Following a rack upgrade from software releaes 3.0.2-b1392231 to software release 3.0.2-b1483396, OKE clusters originally created on software release
3.0.2-b1392231 might encounter an
invalid compartment error and cannot be upgraded to Kubernetes version
1.31.6.
Workaround: To resolve this, perform a node pool upgrade to version 1.30.10 or a similar version first. Once the node pool is updated, retry upgrading the cluster to version 1.31.6 or the required version.
Bug: 38409461
Version: 3.0.2
Restarted csi-oci-controller Pods of OKE Cluster in ImagePullBackOff post
BYOC
After setting up bring your own certificate and updating the certificates on the cluster, new
or restarted csi-oci-controller pods might not start because they’re unable
to pull the required image, resulting in an ImagePullBackOff status.
Workaround: To resolve the issue, delete the control plane nodes from your cluster. After deletion, re-register these nodes. This causes the system to launch new pods, which should now start normally.
Bug: 38457116
Version: 3.0.2
Cluster Create with Version v1.31.6 Fails With Connection Error When DNS Domain Not Defined for OKE VCN
dial tcp: lookup <control-plane-node> on 169.254.168.254:53: no such hostWorkaround: This problem arises if the VCN was created without a
DNS Domain Name. To resolve the issue, ensure that dns-label is set for the
VCN.
Bug: 38309694
Version: 3.0.2