Upgrading CNE

3.1 Supported Upgrade Paths

The following table lists the supported upgrade paths for CNE:

Table 3-1 Supported Upgrade Paths

Source Release	Target Release
23.4.x (where x = 1,4)	23.4.6
23.3.x	23.4.6

Note:

CNE supports performing an OS update on an existing system that runs version 23.4.1 or above on Oracle Linux 9.

3.2 Prerequisites

Before upgrading CNE, ensure that you meet the following prerequisites:

The user's central repository is updated with the latest versions of RPMs, binaries, and CNE Images for 23.4.x. For more information on how to update RPMs, binaries, and CNE images, see Artifact Acquisition and Hosting.
All Network Functions (NFs) are upgraded before performing a CNE upgrade. For more information about NF upgrade procedure, see Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
The CNE instance that is upgraded has at least the minimum recommended node counts for Kubernetes (that is, three master nodes and six worker nodes).

Note:

Currently, CNE doesn't support rollback in any instances such as:

encountering an error after initiating an upgrade
after a successful upgrade

Caution:

User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content especially when the hyphens or any special characters are part of the copied content.

3.3 Common Services Release Information

On successful installation, CNE generates a list of Kubernetes and all the common services release details in a generated files on the Bastion Host. These files are also updated during an upgrade. You can refer to the following files to get the release information after a successful CNE upgrade. The files are available on the Bastion Host in the /var/occne/cluster/${OCCNE_CLUSTER}/artifacts directory:

Kubernetes Release File: K8S_container_images.txt
Common Services Release File: CFG_container_images.txt

3.4 Preupgrade Tasks

Before upgrading CNE, perform the tasks described in this section.

3.4.1 Saving CNE Customizations

Before upgrading a CNE instance, you must save all the customizations applied to the CNE instance so that you can reapply them after the upgrade is complete.

3.4.1.1 Preserving Grafana Dashboards

This section provides the steps to save customer-specific dashboards to a local directory so you can restore it after the upgrade.

Log in to Grafana GUI.
Select the dashboard to save.
Click Share Dashboard to save the dashboard.

Figure 3-1 Grafana Dashboard
Navigate to the Export tab and click Save to file to save the file in the local repository.

Figure 3-2 Saving the Dashboard in Local Repository
Repeat steps 1 to 4 until you save all the required customer-specific dashboards.

3.4.2 Performing Preupgrade Health Checks

Perform the following steps to ensure that cluster is in healthy state.

Check drive space on Bastion Host

Before upgrading, ensure that there's sufficient drive space at the home directory of the user (usually admusr for Bare Metal and cloud-user for vCNE), where the upgrade runs.

The df -h command in the /home directory must have at least 3.5 GB and /var directory must have at least 10 GB free space for temporary gathering and for running the CNE containers when the upgrade procedure runs.

If there is insufficient space, then free up some space. One common location to reclaim space is the podman image storage for local images. You can find the local images using the 'podman image ls' command, and remove them by using the 'podman image rm -f [image]' command. You can reclaim additional podman space by using the 'podman system prune -fa' command to remove any unreferenced image layers.

Check OpenSearch pods disk space

For Elasticsearch pods that are using PVCs, check available disk space and confirm that there is at least 1 GB of disk space available before running the upgrade.

kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-1 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-2 -c opensearch -- df -h /usr/share/opensearch/data

For example:

$ kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data

Sample output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        9.8G  5.3G  4.1G  57% /usr/share/opensearch/data

3.4.3 Checking Preupgrade Config Files

Check manual updates on the pod resources: Check the manual updates made to the Kubernetes cluster configuration such as deployments and daemonsets, after the initial deployment, are configured in the proper occne.ini (vCNE) or hosts.ini (Bare Metal) file. For more information, see Preinstallation Tasks.

3.5 Standard Upgrade

This section describes the procedure to perform a full upgrade, OS update, or both on a given CNE deployment (Bare Metal or vCNE).

Note:

This upgrade is only used to upgrade from release 23.3.x to release 23.4.x.
Ensure that you complete all the preupgrade procedures before performing the upgarde.
It is suggested to use a terminal emulator (such as, tmux) when running this procedure so the Bastion Host bash shell continues to run even in case of shell and VPN disconnections.
It is suggested to use a session capture program (such as, Script) on the Bastion Host to capture all input and output for diagnosing issues. This program must be rerun for each login.
Initiate the upgrade.sh script from the active Bastion Host. However, during most of the upgrades, there is no designated active Bastion Host as the system changes continuously. Therefore, ensure that you run the upgrade.sh script from the same Bastion Host that was used initially.
The upgrade procedure can take hours to complete and the total time depends on the configuration of the cluster.
Before performing an upgrade or OS update, verify the health of the cluster and the services related to CNE.

WARNING:

Refrain from performing a controlled abort (ctrl-C) on the upgrade while it is in progress. Allow the upgrade to exit gracefully from an error condition or after a successful completion.

Log Files for Debugging Upgrade

The system generates many log files during the upgrade or OS update process. All the log files are suffixed with a date and timestamp. These files are maintained in the /var/occne/cluster/<cluster_name>/upgrade/logs directory and can be removed after the upgrade or OS update completes successfully. For any issues encountered during the upgrade, these files must be collected into a tar file and made available to the next level of support for debugging.

3.5.1 Performing an Upgrade or OS Update

This section describes the procedure to perform an upgrade, OS upgrade, or both.

Note:

The upgrade and OS update causes the current Bastion Host to reboot multiple times. Each time the upgrade.sh script terminates without indicating an error condition, rerun the upgrade.sh script using this procedure on the same Bastion Host after it reboots.
Currently, any procedure that applies to VMware and uses Terraform doesn't operate successfully and cannot be used. These procedures include the following:
- Replacing a Failed vCNE LoadBalancer
- Replacing a Failed Kubernetes Worker Node
- Replacing a Failed Kubernetes Controller Node
For more information about these fault recovery procedures, see the Fault Recovery section.
OpenStack maintenance procedures that utilize Terraform in any step to create a new VM will not work properly if the cluster.tfvars image field is not updated to "ol9u2" after the upgrade completes. For more information about downloading Oracle Linux, see Downloading Oracle Linux.

Use SSH to log in to the active Bastion Host. You can determine the active Bastion Host by running the following command:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
If you are rerunning the upgrade script after an error or termination, log in to the same Bastion Host that was used during the initial run.
Ensure that the naming format of the existing OLX .repo file in the /var/occne/yum.repos.d/ directory is <CENTRAL_REPO>-olx.repo, where x is the version number (For example, <CENTRAL_REPO>-ol8.repo).
Ensure that the new central repository OLX yum .repo file is present in the is in /var/occne/yum.repos.d/ directory and the format of the file name is <CENTRAL_REPO>-olx.repo, where x is the version number (For example, <CENTRAL_REPO>-ol9.repo):
For example:
```
$ curl http://${CENTRAL_REPO}/<path_to_file>/${CENTRAL_REPO}-ol9.repo -o /var/occne/yum.repos.d/${CENTRAL_REPO}-ol9.repo
```
Note:
Ensure the content of the new central repository OLX yum .repo file is correct.
Perform one the following steps to initiate or resume the upgrade or the OS update:

Note:
The upgrade runs an initial cluster test based on the current OCCNE_VERSION. If the initial upgrade cluster test fails, the upgrade.sh script terminates. However, at this point, the upgrade is not started. Therefore, after correcting the issues discovered, you can restart the upgrade using step a. This is not applicable to the usual expected exits in the upgrade.sh script.
1. Run the following command to launch the upgrade script to perform an upgrade to the new version:
```
$ OCCNE_NEW_VERSION=<new_version_tag> upgrade.sh
```
2. Run the following command to launch the upgrade script to perform an OS update and subsequent runs for both upgrade and OS update:
```
$ upgrade.sh
```
When the upgrade process initiates a reboot of the hosting Bastion, the upgrade.sh script terminates gracefully with the following output and the current Bastion is rebooted after a short period.
Sample output:
```
The current Bastion: ${OCCNE_HOSTNAME} will be going into reboot after configuration changes.
Wait until the reboot completes and follow the documentation to continue the upgrade.
This may take a number of minutes, longer on a major OS upgrade.
```
Once the Bastion recovers from the reboot, rerun upgrade.sh by running the command in Step 4b.
Note:
- Once the upgrade begins on each node (starting with the active Bastion Host), the login Banner in Shell is updated to reflect the following message. This is restored back to the original message when the upgrade completes. This Banner is not set in the current active LBVMs for vCNE when the Banner is set in the other nodes. This is because the current active LBVMs are not upgraded until the end of the procedure.
```
****************************************************************************
|
|   Date/Time: 2024-01-16 14:56:31.612728
|
|   OCCNE UPGRADE TO VERSION: 23.4.1 IN PROGRESS
|
|   Please discontinue login if not assisting in this maintenance activity.
|
| ****************************************************************************
```
- In some cases, you may see an "Ansible FAILED!" assertion message after the following message. This is an expected behavior where the system tries to return the control to Shell when CNE detects that a reboot will interrupt processing.
```
TASK [staged_reboot : Halt ansible for os_upgrade reboot on current bastion (or its kvm host).  After reboot, reconnect to same bastion, and relaunch upgrade.sh] ***
fatal: [my-cluster-name-bastion-1]: FAILED! => {
    "assertion": false,
    "changed": false,
    "evaluated_to": false,
    "msg": "NOT AN ERROR: This is an EXPECTED assertion to flag self-reboot, and return shell control."
}
 
PLAY RECAP *********************************************************************
my-cluster-name-bastion-1 : ok=70   changed=24   unreachable=0    failed=1    skipped=194  rescued=1    ignored=0
```
- By default, during an upgrade and OS update, the upgrade.sh script exits before rebooting the ACTIVE LBVMs by displaying the following message:
```
Skipping active LBVMs reboot since OCCNE_REBOOT_ACTIVE_LB is not set.
The active LBVMs must be manually rebooted and the upgrade.sh script be run again.
```
  You must manually reboot or switch the activity of each ACTIVE LBVM such that it becomes the STANDBY LBVM. For procedure to perform a manual switchover of LBVMs, see the Performing Manual Switchover of LBVMs During Upgrade section. When the switchover of the LBVMs completes successfully, rerun the upgrade.sh script.
When the upgrade or OS update is complete, the system displays the following message:
Message format for CNE upgrade:
```
<date/time>******** Upgrade Complete ***********
```
For example:
```
12/12/2023 - December Tuesday 01:09:57 - *********** Upgrade Complete **************
```
Message format for OS update:
```
<date/time>******** OS Update Complete ***********
```

3.6 Postupgrade Tasks

This section describes the postupgrade tasks for CNE.

3.6.1 Restoring CNE Customizations

This section provides information about restoring CNE customizations. Ensure that you restore all the customizations applied to the CNE instance after completing the upgrade process.

3.6.1.1 Restoring Grafana Dashboards

Perform the following steps to restore the Grafana dashboard:

Load the previously installed Grafana dashboard.
Click the + icon on the left panel and select Import.

Figure 3-3 Load Grafana Dashboard
Once in new panel, click Upload JSON file. Choose the locally saved dashboard file.

Figure 3-4 Uploading the Dashboard
Repeat the same steps for all the dashboards saved from the older version.

3.6.2 Verifying Terraform Files in VMware Deployments

This section provides details about verifying the content of the compute/main.tf and compute-lbvm/main.tf files in a VMware deployment after performing an upgrade.

Update the Linux image template from OL8 to OL9 by changing the template_name variable in the /var/occne/cluster/${OCCNE_CLUSTER}/${OCCNE_CLUSTER}/cluster.tfvars file:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/${OCCNE_CLUSTER}/cluster.tfvars
```
The following example shows the template_name variable that must be updated:
```
template_name = "<name of the OL9 template>"
```

Verify the content of the compute/main.tf and compute-lbvm/main.tf files:

Run the following command to verify the content of the compute/main.tf file:

$ cat /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf | grep 'ignore_changes\|override_template_disk' -C 2

Ensure that the content of the file exactly matches the following content:

 }

  override_template_disk {
    bus_type    = "paravirtual"
    size_in_mb  = var.disk
--

  lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }
--
  }

  override_template_disk {
    bus_type    = "paravirtual"
    size_in_mb  = var.disk
--

  lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }

Run the following command to verify the content of the compute-lbvm/main.tf file:

$ cat /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute-lbvm/main.tf | grep 'ignore_changes\|override_template_disk' -C 2

Ensure that the content of the file exactly matches the following content:

}

  override_template_disk {
    bus_type    = "paravirtual"
    size_in_mb  = var.disk
--

  lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }
--
  }

  override_template_disk {
    bus_type    = "paravirtual"
    size_in_mb  = var.disk
--

  lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }

If the files don't contain the ignore_changes argument, then edit the files and add the argument to each of the "vcd_vapp_vm" resources:
1. Run the following command to edit the compute/main.tf file:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf
```
2. Add the following content between each override_template_disk code block and metadata = var.metadata line for each "vcd_vapp_vm" resource:
```
lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }
```
3. Save the compute/main.tf file.
4. Run the following command to edit the compute-lbvm/main.tf file:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf
```
5. Add the following content between each override_template_disk code block and metadata = var.metadata line for each "vcd_vapp_vm" resource:
```
 lifecycle {
    ignore_changes = [
      vapp_template_id,
      template_name,
      catalog_name,
      override_template_disk
     ]
  }
```
6. Save the compute-lbvm/main.tf file.
7. Repeat step 1 to ensure that the content of the files matches the ones provided in the step.

3.6.3 Activating Optional Features

This section provides information about activating optional features, such as Velero and Local DNS post upgrade.

3.6.3.1 Activating Velero Post Upgrade

This section provides information about the Velero activation procedure.

Velero is used for performing on-demand backups and restore of CNE cluster data. Velero is an optional feature and has extra set of hardware and networking requirements. You can activate Veloro after a CNE installation or upgrade. For more information about activating Velero, see Activating Velero.

3.6.3.2 Activating Local DNS

This section provides information about activating Local DNS post upgrade.

The Local DNS feature is a reconfiguration of core DNS (CoreDNS) to support external hostname resolution. When Local DNS is enabled, CNE routes the connection to external hosts through core DNS rather than the nameservers on the Bastion Hosts. For information about activating this feature post upgrade, see the "Activating Local DNS" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

To stop DNS forwarding to Bastion DNS, you must define the DNS details through 'A' records and SRV records. A records and SRV records are added to CNE cluster using Local DNS API calls. For more information about adding and deleting DNS records, see the "Adding and Removing DNS Records" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

3.6.4 Updating Port Name for servicemonitors and podmonitors

The metric port name on which Prometheus extracts metrics from 5G-CNC applications must be updated to "cnc-metrics".

To update the port name, do the following:

Run the following command to get the servicemonitor details:
```
$ kubectl get servicemonitor -n occne-infra
```
Sample output:
```
NAME                         AGE
occne-nf-cnc-servicemonitor  60m
```

Run the following command to update the port name for servicemonitor:

$ kubectl edit servicemonitor occne-nf-cnc-servicemonitor -n occne-infra
# Edit the above servicemonitor and update the following port name by removing "http" prefix.
 existing port name -
   port: http-cnc-metrics
 updated port name -
   port: cnc-metrics

Save the changes for servicemonitor.
Run the following command to get the podmonitor details:
```
$ kubectl get podmonitor -n occne-infra
```
Sample output:
```
NAME                     AGE
occne-nf-cnc-podmonitor  60m
```

Run the following command to update the port name for podmonitor:

$ kubectl edit podmonitor occne-nf-cnc-podmonitor  -n occne-infra
 
  existing port name -
   port: http-cnc-metrics
  updated port name -
   port: cnc-metrics

Save the changes for podmonitor.

3 Upgrading CNE

3.1 Supported Upgrade Paths

3.2 Prerequisites

3.3 Common Services Release Information

3.4 Preupgrade Tasks

3.4.1 Saving CNE Customizations

3.4.1.1 Preserving Grafana Dashboards

3.4.2 Performing Preupgrade Health Checks

3.4.3 Checking Preupgrade Config Files

3.5 Standard Upgrade

3.5.1 Performing an Upgrade or OS Update

3.6 Postupgrade Tasks

3.6.1 Restoring CNE Customizations

3.6.1.1 Restoring Grafana Dashboards

3.6.2 Verifying Terraform Files in VMware Deployments

3.6.3 Activating Optional Features

3.6.3.1 Activating Velero Post Upgrade

3.6.3.2 Activating Local DNS

3.6.4 Updating Port Name for servicemonitors and podmonitors