Upgrading the Kubernetes Cluster

Caution:

Ensure that all preparation steps for system upgrade have been completed. For instructions, see Preparing the Upgrade Environment.

The Kubernetes container orchestration environment upgrade is also kept separate from the operating system. With a single command, all Kubernetes packages, such as kubeadm, kubectl and kubelet, are upgraded on the three management nodes and all the compute nodes. Note that this upgrade does not include the microservices running within the Kubernetes cluster.

For dependency reasons, Kubernetes must be upgraded after the management node host operating system. The Kubernetes upgrade command has no mandatory parameters.

About the Kubernetes Upgrade Process

To ensure compatibility and continuation of service, Kubernetes must be upgraded one version at a time. Skipping versions – major or minor – is not supported. The Private Cloud Appliance Upgrader manages this process by upgrading or patching all parts of the Kubernetes cluster to the next available version, repeating the same sequence of operations until the entire environment runs the latest Kubernetes version available from the appliance software repositories.

Upgrading or patching the Kubernetes cluster is a time-consuming process that involves the Private Cloud Appliancemanagement nodes and compute nodes. Each additional compute node extends the process by appoximately 10 minutes for each incremental version of Kubernetes.

With appliance software version 3.0.2-b925538, the container orchestration environment is upgraded or patched from Kubernetes version 1.20.x to version 1.25.y, meaning the entire process must run 5 times. After each successful run, the repository is synchronized to retrieve the next required version. However, with this version of the appliance software the repository is reconfigured to allow multiple versions of the Kubernetes packages, so the resync will no longer be required.

Each individual Kubernetes node upgrade is expected to take around 10 minutes. Testing indicates that upgrading or patching the Private Cloud Appliance Kubernetes cluster from version 1.20 to version 1.25 takes approximately 4-5 hours for a base rack configuration with 3 management nodes and 3 compute nodes. On a full rack with 20 compute nodes the entire process requires at least 9 hours and may take up to 18 hours to complete. The estimated time for the rack's specific configuration is reported in the upgrade plan.

To monitor the upgrade or patching progress, periodically check the job status or the logs.

  • Check job status through the Service CLI: getUpgradeJob upgradeJobId=<id>

  • View Upgrader logs on a management node: tail -f /nfs/shared_storage/pca_upgrader/log/pca-upgrader_kubernetes_cluster_<time_stamp>.log.

During Kubernetes upgrade or patching, certain services could be temporarily unavailable.

  • The Compute Web UI, Service Web UI, OCI CLI, and Service CLI can all become temporarily unavailable. Users should wait a few minutes before attempting their operations again. Administrative operations in the Service Enclave (UI or CLI) must be avoided during upgrade or patching.

  • When the Kubernetes upgrade is initiated, the Kubernetes Workload Monitoring Operator (Sauron service) is taken down. As a result, the Grafana, Prometheus, and other Sauron ingress endpoints cannot be accessed. They become available again after both the Kubernetes cluster and the containerized microservices (platform layer) upgrade or patching processes have been completed.

Managing Unprovisioned Compute Nodes

If you upgrade or patch the Kubernetes cluster on a Private Cloud Appliance that contains unprovisioned compute nodes, there could be provisioning issues later. Because those compute nodes were not part of the Kubernetes cluster when the newer version was applied, you may need to rediscover them first.

If compute node provisioning fails after upgrading or patching the Kubernetes cluster, log on to one of the management nodes using ssh. Rediscover the unprovisioned compute nodes by running the following command with the appropriate host names:

# pca-admin compute node rediscover --hostname pcacn000

When the compute nodes have been rediscovered, provisioning is expected to work as intended.

For more information about provisioning, refer to "Performing Compute Node Operations" in the chapter Hardware Administration of the Oracle Private Cloud Appliance Administrator Guide.

Upgrade the Kubernetes Cluster Using the Service Web UI

  1. In the navigation menu, click Upgrade & Patching.

  2. In the top-right corner of the Upgrade Jobs page, click Create Upgrade or Patch.

    The Create Request window appears. Choose Upgrade as the Request Type.

  3. Select the appropriate upgrade request type: Upgrade Kubernetes.

  4. If required, fill out the upgrade request parameters:

    • Advanced Options JSON: Optionally, add a JSON string to provide additional command parameters.

    • Image Location: This parameter is deprecated.

    • ISO Checksum: This parameter is deprecated.

    • Log Level: Optionally, select a specific log level for the upgrade log file. The default log level is "Information". For maximum detail, select "Debug".

  5. Click Create Request.

    The new upgrade request appears in the Upgrade Jobs table.

Upgrade the Kubernetes Cluster Using the Service CLI

  1. Enter the upgrade command.

    PCA-ADMIN> upgradeKubernetes
    Command: upgradeKubernetes
    Status: Success
    Time: 2021-09-26 17:20:09,423 UTC
    Data:
      Service request has been submitted. Upgrade Job Id = 1632849609034-kubernetes-35545 Upgrade Request Id = UWS-edfa3b32-c32a-4b67-8df5-2357096052bf
  2. Use the request ID and the job ID to check the status of the upgrade process.

    PCA-ADMIN> getUpgradeJobs
      id                               upgradeRequestId                           commandName   result
      --                               ----------------                           -----------   ------
      1632849609034-kubernetes-35545   UWS-edfa3b32-c32a-4b67-8df5-2357096052bf   kubernetes    Passed
      1632826770954-etcd-26973         UWS-fec15d32-fc2b-48bd-9ae0-62f49587a284   etcd          Passed
      1632850933353-vault-16966        UWS-352df3d1-c21f-441b-8f6e-9381ac075906   vault         Passed
    
    PCA-ADMIN> getUpgradeJob upgradeJobId=1632849609034-kubernetes-35545
    Command: getUpgradeJob upgradeJobId=1632849609034-kubernetes-35545
    Status: Success
    Time: 2021-09-26 17:43:38,443 UTC
    Data:
      Upgrade Request Id = UWS-edfa3b32-c32a-4b67-8df5-2357096052bf
      Name = kubernetes
      Start Time = 2021-09-26T17:20:09
      End Time = 2021-09-26T17:21:52
      Pid = 35545
      Host = pcamn02
      Log File = /nfs/shared_storage/pca_upgrader/log/pca-upgrader_kubernetes_cluster_2021_09_26-17.20.09.log
      Arguments = {"verify_only":false,"upgrade":false,"diagnostics":false,"host_ip":null,"result_override":null,"log_level":null,"switch_type":null,"precheck_status":false,"task_time":0,"fail_halt":false,"fail_upgrade":null,"component_names":null,"upgrade_to":null,"image_location":"http://host.example.com/pca-3.0.1-b535176.iso","epld_image_location":null,"expected_iso_checksum":null,"checksum":"240420cfb9478f6fd026f0a5fa0e998e086275fc45e207fb5631e2e99732e192e8e9d1b4c7f29026f0a5f58dadc4d792d0cfb0279962838e95a0f0a5fa31dca7","composition_id":null,"request_id":"UWS-edfa3b32-c32a-4b67-8df5-2357096052bf","display_task_plan":false,"dry_run_tasks":false}
      Status = Passed
      Execution Time(sec) = 249
      Tasks 1 - Name = Retrieving Cluster Status
      Tasks 1 - Description = Retrieving cluster status and upgrade data from the kubernetes nodes
      Tasks 1 - Time = 2021-09-26T17:20:10
    [...]