Upgrading CNE

3.1 Supported Upgrade Paths

The following table lists the supported upgrade paths for CNE:

Table 3-1 Supported Upgrade Paths

Source Release	Target Release
25.1.100	25.1.101

3.2 Prerequisites

Before upgrading CNE, ensure that you meet the following prerequisites:

The user's central repository is updated with the latest versions of RPMs, binaries, and CNE Images for 25.1.1xx. For more information on how to update RPMs, binaries, and CNE images, see Artifact Acquisition and Hosting.
All Network Functions (NFs) are upgraded before performing a CNE upgrade. For more information about NF upgrade procedure, see Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
The CNE instance that is upgraded has at least the minimum recommended node counts for Kubernetes (that is, three master nodes and six worker nodes).

Note:

Currently, CNE doesn't support rollback in any instances such as:

encountering an error after initiating an upgrade
after a successful upgrade

Caution:

User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content especially when the hyphens or any special characters are part of the copied content.

3.3 Common Services Release Information

On successful installation, CNE generates a list of Kubernetes and all the common services release details in a generated files on the Bastion Host. These files are also updated during an upgrade. You can refer to the following files to get the release information after a successful CNE upgrade. The files are available on the Bastion Host in the /var/occne/cluster/${OCCNE_CLUSTER}/artifacts directory:

Kubernetes Release File: K8S_container_images.txt
Common Services Release File: CFG_container_images.txt

3.4 Preupgrade Tasks

Before upgrading CNE, perform the tasks described in this section.

3.4.1 Saving CNE Customizations

Before upgrading a CNE instance, you must save all the customizations applied to the CNE instance so that you can reapply them after the upgrade is complete.

3.4.1.1 Preserving Prometheus Alert Rules

This section provides the steps to back up user-specific Prometheus alert rules, so they can be restored after an upgrade. For more information about restoring Prometheus alert rules using the backups, see Restoring Prometheus Alert Rules.

Use SSH to log in to the active Bastion and run the following command to confirm if it is the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Create the backup-alert-rules directory and navigate to the directory:
```
$ mkdir ~/backup-alert-rules && cd ~/backup-alert-rules
```
Run the following command to backup user-specific alert rules into YAML files, excluding occne-alerting-rules which is the default alert rule:

Note:
Each YAML file is named after the alert rule that is backed up.
```
$ for prom in $(kco get prometheusrules -o jsonpath='{.items[*].metadata.name}'); do [ "$prom" != "occne-alerting-rules" ] && kco get prometheusrules $prom -o yaml > "${prom}.yaml"; done
```
When the backup is complete, you can use the ls command to view the list of backup files created.
For example:
```
$ for prom in $(kco get prometheusrules -o jsonpath='{.items[*].metadata.name}'); do [ "$prom" != "occne-alerting-rules" ] && kco get prometheusrules $prom -o yaml > "${prom}.yaml"; done
$ ls
```
Sample output:
```
alert.yaml  example.yaml  occne-test.yaml
```

3.4.1.2 Preserving Grafana Dashboards

This section provides the steps to back up user-specific Grafana dashboards to a local directory, so they can be restored after an upgrade. For more information about restoring Grafana dashboards using the backups, see Restoring Grafana Dashboards.

Log in to Grafana GUI.
Select the dashboard to save.
Click Share Dashboard to save the dashboard.

Figure 3-1 Grafana Dashboard
Navigate to the Export tab and click Save to file to save the file in the local repository.

Figure 3-2 Saving the Dashboard in Local Repository
Repeat steps 1 to 4 until you save all the required customer-specific dashboards.

3.4.1.3 Renewing Kubernetes Certificate

In Oracle Cloud Native Environment (CNE), the Kubernetes control plane certificates (example, for the API server, controller manager, scheduler, and so on) are not automatically renewed on a regular schedule unless there is a Kubernetes version upgrade (uplift) occurs.

If the CNE upgrade includes a Kubernetes version uplift for example, upgrading from Kubernetes 1.27.x to 1.28.x, the control plane certificates are automatically renewed as part of the upgrade process.
If the CNE upgrade does not include a Kubernetes version uplift for example, upgrading from CNE 24.2.x to CNE 24.2.x build, where the Kubernetes version stays the same then, Kubernetes control plane certificates are not automatically renewed during the upgrade process.

If certificate renewal is required but no Kubernetes version uplift is planned then the control plane certificate renewal must be performed manually by following the procedure provided in the "Renewing Kubernetes Certificates" section in Oracle Cloud Native Core, Cloud Native Environment User Guide.

3.4.1.4 Patching the Multus DaemonSet

Note:

This section is applicable only for CNLB enabled CNE deployment.

Multus is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods. Multus can be deployed by simply applying the thick DaemonSet with kubectl.

This section provides steps to patch the init containers of Multus DaemonSet and perform a rollout restart on the pods.

From Bastion Host of CNE deployed environment, perform the following steps:

Run the following command to patch the init containers of Multus Daemon set:

kubectl patch daemonset kube-multus-ds -n kube-system --type='json' -p='[{ "op": "replace", "path": "/spec/template/spec/initContainers/0/command","value": [ "/usr/src/multus-cni/bin/install_multus", "-d", "/host/opt/cni/bin","-t", "thick"]}]'

Run the following command to perform a rollout restart on the pods:
```
kubectl -n kube-system rollout restart ds kube-multus-ds
```

3.4.1.5 Updating OpenSearch Master Node Role

This section provides information about updating OpenSearch master node role.

Check if the current role is set to master,data:

$ kubectl -n occne-infra get sts occne-opensearch-cluster-master -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="node.roles")]}'

Sample output:

{"name":"node.roles","value":"master,data"}

Run the following patch command:

$ kubectl -n occne-infra get sts occne-opensearch-cluster-master -o json | jq '(.spec.template.spec.containers[0].env[] | select(.name == "node.roles")).value = "master"' | kubectl apply -f -

Sample output:

statefulset.apps/occne-opensearch-cluster-master configured

Verify if the patch is successful:

$ kubectl -n occne-infra get sts occne-opensearch-cluster-master -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="node.roles")]}'

Sample output:

{"name":"node.roles","value":"master"}

3.4.2 Performing Preupgrade Health Checks

Perform the following steps to ensure that cluster is in healthy state.

Check drive space on Bastion Host

Before upgrading, ensure that there's sufficient drive space at the home directory of the user (usually admusr for Bare Metal and cloud-user for vCNE), where the upgrade runs.

The df -h command in the /home directory ((df -h /home)) must have at least 4 GB and /var directory ((df -h /var)) must have at least 10 GB free space for temporary gathering and operation of the CNE containers when the upgrade procedure runs.

If there is insufficient space, then free up some space. One common location to reclaim space is the podman image storage for local images. You can find the local images using the 'podman image ls' command, and remove them by using the 'podman image rm -f [image]' command. You can reclaim additional podman space by using the 'podman system prune -fa' command to remove any unreferenced image layers.

Note:

When the upgrade.sh script is run, it performs the space checks and exits the upgrade if there is insufficient space in the two directories.

Check OpenSearch pods disk space

For OpenSearch pods that are using PVCs, check available disk space and confirm that there is at least 1 GB of disk space available before running the upgrade.

kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-1 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-2 -c opensearch -- df -h /usr/share/opensearch/data

For example:

$ kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data

Sample output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        9.8G  5.3G  4.1G  57% /usr/share/opensearch/data

3.4.3 Upgrading Grafana

This section details the procedure to upgrade Grafana to a custom version.

Note:

This procedure is optional and can be run if you want to upgrade Grafana to a custom version.
This procedure is applicable to both BareMetal and vCNE deployments.

Limitations

This procedure is only used to upgrade from Grafana release 9.5.3 to 11.2.x.
Grafana version 11.2.x is not tested with CNE. If you are upgrading to Grafana 11.2.x, ensure that you manage, adapt and maintain the version.
Plugin installation and Helm chart adaptation are not in the purview of this procedure.
Some versions of Grafana image may try to pull certificates from the internet.

Prerequisites

Before running the procedure, ensure that you meet the following prerequisites:

The cluster must run a stable Grafana version. Most CNE clusters run with the version 9.5.3 which is acceptable.
This procedure must be run in the active Bastion Host.
The target version of Grafana must available in the cluster. This can be achieved by pulling the required version from the desired repository.
Podman must be installed and you must be able to run the Podman commands.
Upgrade Helm to the minimum supported Helm version (3.15.2 or more).
kubectl must be installed.

Procedure

Log in to the active Bastion Host and run the following command to ensure that you are logged in to the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```

Ensure the desired Grafana image is present in the podman registry:

$ podman image ls

Sample output:

REPOSITORY                       TAG                           IMAGE ID      CREATED      SIZE
winterfell:5000/occne/provision  25.2.0-alpha.0-11-g647fa73e6  04e905388051  3 days ago   2.48 GB
localhost/grafana/grafana        11.2.5                        37c12d738603  6 weeks ago  469 MB

Tag and push the image to follow CNE image naming convention. This is done to ensure that the repository has the correct naming convention after pulling the desired image version.

$ podman tag <CURRENT_GRAFANA_IMAGE_NAME>:<CURRENT_TAG> occne-repo-host:5000/occne/<DESIRED_GRAFANA_NAME>:<CURRENT_TAG>
$ podman push occne-repo-host:5000/occne/<DESIRED_GRAFANA_NAME>:<CURRENT_TAG>

For example:

$ podman tag localhost/grafana/grafana:11.2.5 occne-repo-host:5000/occne/grafana:11.2.5
$ podman push occne-repo-host:5000/occne/grafana:11.2.5

Review all the deployments on the cluster and search for the Grafana deployment:

$ kubectl -n occne-infra get deploy

Sample output:

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
cnlb-app                                   4/4     4            4           6h59m
cnlb-manager                               1/1     1            1           6h59m
occne-alertmanager-snmp-notifier           1/1     1            1           6h54m
occne-bastion-controller                   1/1     1            1           6h54m
occne-kube-prom-stack-grafana              1/1     1            1           6h55m # HERE IS THE GRAFANA DEPLOYMENT
occne-kube-prom-stack-kube-operator        1/1     1            1           6h55m
occne-kube-prom-stack-kube-state-metrics   1/1     1            1           6h55m
occne-metrics-server                       1/1     1            1           6h54m
occne-opensearch-dashboards                1/1     1            1           6h55m
occne-promxy                               1/1     1            1           6h54m
occne-promxy-apigw-nginx                   2/2     2            2           6h54m
occne-tracer-jaeger-collector              1/1     1            1           6h54m
occne-tracer-jaeger-query                  1/1     1            1           6h54m

Edit the occne-kube-prom-stack-grafana deployment. This opens an editable YAML file where you can loacate the previous Grafana image.

$ kubectl -n occne-infra edit deploy occne-kube-prom-stack-grafana

Sample output:

...
        - name: GF_PATHS_DATA
          value: /var/lib/grafana/
        - name: GF_PATHS_LOGS
          value: /var/log/grafana
        - name: GF_PATHS_PLUGINS
          value: /var/lib/grafana/plugins
        - name: GF_PATHS_PROVISIONING
          value: /etc/grafana/provisioning
        image: occne-repo-host:5000/docker.io/grafana/grafana:9.5.3               # HERE IS THE IMAGE
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
...

Replace the old image with the recently pushed image:

For example:

...
        - name: GF_PATHS_DATA
          value: /var/lib/grafana/
        - name: GF_PATHS_LOGS
          value: /var/log/grafana
        - name: GF_PATHS_PLUGINS
          value: /var/lib/grafana/plugins
        - name: GF_PATHS_PROVISIONING
          value: /etc/grafana/provisioning
        image: occne-repo-host:5000/occne/grafana:11.2.5               # HERE IS THE IMAGE
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
...

Run the following command to verify the pods' health. Ensure that all pods are in the healthy Running state with no resets.
```
$ kco get pods | grep grafana
```
Sample output:
```
occne-kube-prom-stack-grafana-7ccf687579-ns94w             3/3     Running     0              7h18m
```
Run the following command to verify the pods internal logs. Use the pod name obtained from the previous step.
```
$ kubectl -n occne-infra logs <YOUR_GRAFANA_POD>
```
For example:
```
$ kubectl -n occne-infra logs occne-kube-prom-stack-grafana-7ccf687579-ns94w
```

Depending on the type of Load Balancer used, use one of the following steps to retrieve Grafana external IP:

If you are using LBVM, run the following command to extract the external IP:

[cloud-user@occne1-<user-name>-bastion-1 ~]$ kubectl -n occne-infra get service | grep grafana -o wide

Sample output:

NAME                                             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                           AGE
occne-kube-prom-stack-grafana                    LoadBalancer   10.233.42.123   10.75.200.32    80:30553/TCP                                      4d21h

If you are using LBVM, use the occne.ini file to extract the external IP:

$ cat /var/occne/cluster/$OCCNE_CLUSTER/occne.ini | grep occne_graf_cnlb

Sample output:

occne_graf_cnlb = 10.75.200.32   
 #In both of these examples the external ip is -   10.75.200.32

Ensure that the Grafana dashboard is accessible by either pinging the Grafana external IP or accessing the dashboard in a browser.
The following code block provides the command to ping the external IP:
```
$ ping <YOUR-GRAFANA-EXTERNAL-IP>
```
For example:
```
$ ping 10.75.200.32
```
Sample output:
```
PING  10.75.200.32    ( 10.75.200.32    ) 56(84) bytes of data.
64 bytes from  10.75.200.32    : icmp_seq=1 ttl=62 time=3.04 ms
64 bytes from  10.75.200.32    : icmp_seq=2 ttl=62 time=1.63 ms
64 bytes from  10.75.200.32    : icmp_seq=3 ttl=62 time=1.24 ms
```
The following code block provides the CURL command to access Grafana dashboard using the external IP:
```
$ curl <YOUR-GRAFANA-EXTERNAL-IP>
```
For example:
```
$ curl 10.75.225.166
```
Sample output:
```
<a href="/occne1-<user-name>/grafana/login">Found</a>.
```

3.4.4 Checking Preupgrade Config Files

Check manual updates on the pod resources: Check the manual updates made to the Kubernetes cluster configuration such as deployments and daemonsets, after the initial deployment, are configured in the proper occne.ini (vCNE) or hosts.ini (Bare Metal) file. For more information, see Preinstallation Tasks.

3.4.5 Configuring secrets.ini and occne.ini Files

This section explains about creating and configuring the secrets.ini file and removing vars from occne.ini file.

The secrets.ini file contains information on the cluster credentials and cloud specific accounts (Openstack, VMware, BareMetal) credentials which are required to allow the cluster to run correctly.

Perform the following procedure to create a secrets.ini file and remove vars from the occne.ini file:

Create a copy of the secrets.ini file from the secrets.ini.template file in the cluster directory.

Note:
If the secrets.ini.template file is not located in the cluster directory, use the templates given in step 2.

Edit the copy of the secrets.ini file and fill out all the required parameters depending on the platform being upgraded.

secrets.ini parameters

Example for vCNE OpenStack:

[occne:vars]
# Set grub password
occne_grub_password=
 
[openstack:vars]
# Specify the user/pass of the External OpenStack Controller/CSI Cinder plugin accounts needed
# for deployment.
external_openstack_username=
external_openstack_password=
 
cinder_username=
cinder_password=

Example for vCNE VCD:

[occne:vars]
# Set grub password
occne_grub_password=
 
[vcd:vars]
## Specify the vSphere information of the External vSphere Controller/CSI Cinder plugin accounts
## needed for deployment.
external_vsphere_user =
external_vsphere_password =
 
vcd_user=
vcd_passwd=

Example for BareMetal:

[occne:vars]
# Set grub password
occne_grub_password=
 
####
# PXE Settings
pxe_install_lights_out_usr=
pxe_install_lights_out_passwd=
 
 
### ANY OTHER SENSITIVE CREDENTIAL.
# ...

Remove the variables from the occne.ini file for vCNE and hosts.ini file for the BareMetal deployments.
- Edit the occne.ini file:
  For vCNE, run the following command:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini
```
  For BareMetal, run the following command:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/hosts.ini
```
- Remove the variables from the occne.ini and host.ini files that were added in the secrets.ini file.

3.4.6 Checking GRUB Password

This section provides information about configuring Grand Unified Bootloader (GRUB) password before performing an upgrade.

It is mandatory to set the GRUB password before performing an upgrade from 25.1.100 to 25.1.1xx.

You can set the GRUB password by adding the occne_grub_password=<password> variable under the occne:vars section header in occne.ini (for vCNE) or hosts.ini (for BareMetal).

When set, this value must be updated only if you want to change the GRUB password. The upgrade script checks for the this variable in the respective .ini file and fails if it is not set. For more information about configuring GRUB password, see Configuring GRUB Password

.

3.4.7 Performing a Podman System Reset

This section explains about how to perform a podman system reset.

Note:

It is recommonded to run the following command on the Bastion Host on which the upgrade is being run. This ensures there is no leftover image corruption on the Bastion

podman system reset

When prompted, select the Yes option in the next window.

3.4.8 Removing VCD variables from `occne.ini` file

Starting with 25.2.100 version, VMWare Cloud Director (VCD) controllers and plugins versions are now managed exclusively by Kubespray. The following procedure removes these variables from the occne.ini file to avoid overwriting newer versions.

Note:

The following steps must be performed for VMWare clusters. Failure to remove VCD variables from the occne.ini file will block cluster from obtaining the latest vSphere CSI images or versions, causing potential compatibility issues.

Perform the following procedure to remove the VCD variables from the occne.ini file:

Open the occne.ini file in a text editor.
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini
```

Remove only the following lines from the occne.ini file. These lines are at the end of the occne.ini file.

custom CSI Driver version for vSphere (VCD)

vsphere_csi_attacher_image_tag = v4.8.1
vsphere_csi_resizer_tag = v1.12.0
vsphere_csi_controller = v3.5.0
vsphere_csi_driver_image_tag = v3.5.0
vsphere_syncer_image_tag = v3.5.0
vsphere_csi_liveness_probe_image_tag = v2.15.0
vsphere_csi_provisioner_image_tag = v4.0.1
vsphere_csi_node_driver_registrar_image_tag = v2.13.0
vsphere_csi_snapshotter_image_tag = v8.2.0

custom Cloud Controller Manager for vSphere (VCD)

external_vsphere_cloud_controller_image_tag = v1.33.0

Save the file and close the editor.

Verify that VCD variables are not present in the occne.ini file. The following commands must not print any output.

$ cat /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini | grep vsphere_csi
$ cat /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini | grep external_vsphere_cloud_controller_image_tag

3.5 Performing an Upgrade

This section describes the procedure to perform a standard upgrade, OS update, or both on a given CNE deployment (BareMetal or vCNE).

Note:

This upgrade is only used to upgrade from release 25.1.100 to release 25.1.1xx.
Ensure that you complete all the preupgrade procedures before performing the upgrade.
It is suggested to run this procedure from a terminal emulator (like tmux) on the machine being used to sign-in to the Bastion Host. This way, the Bastion Host bash shell continues to run even in case of shell and VPN disconnections.
It is suggested to use a session capture program (such as, Script) on the Bastion Host to capture all input and output for diagnosing issues. This program must be rerun for each login.
Initiate the upgrade.sh script from the active Bastion Host. However, during most of the upgrades, there is no designated active Bastion Host as the system changes continuously. Therefore, ensure that you run the upgrade.sh script from the same Bastion Host that was used initially.
The upgrade procedure can take hours to complete and the total time depends on the configuration of the cluster.
Before performing a standard upgrade or OS update, verify the health of the cluster and the services related to CNE.
Performing a standard upgrade, OS update. or both causes the current Bastion Host to reboot multiple times. Each time the upgrade.sh script terminates without indicating an error condition, rerun the upgrade.sh script using this procedure on the same Bastion Host after it reboots.
Before performing an upgrade, ensure that all CA certificates are up to date.

WARNING:

Refrain from performing a controlled abort (ctrl-C) on the upgrade while it is in progress. Allow the upgrade to exit gracefully from an error condition or after a successful completion.

Log Files for Debugging Upgrade

The system generates many log files during the upgrade or OS update process. All the log files are suffixed with a date and timestamp. These files are maintained in the /var/occne/cluster/<cluster_name>/upgrade/logs directory and can be removed after the upgrade or OS update completes successfully. For any issues encountered during the upgrade, these files must be collected into a tar file and made available to the next level of support for debugging.

Procedure

Use SSH to log in to the active Bastion Host. You can determine the active Bastion Host by running the following command:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Note:
If you are rerunning the upgrade script after an error or termination, log in to the same Bastion Host that was used during the initial run.
Ensure that the new central repository OLX yum repository file is present in the /var/occne/yum.repos.d/ directory and the format of the file name is <CENTRAL_REPO>-olx.repo, where x is the version number (For example, <CENTRAL_REPO>-ol9.repo):
For example:
```
$ curl http://<CENTRAL_REPO>/<path_to_file>/<CENTRAL_REPO>-ol9.repo -o /var/occne/yum.repos.d/<CENTRAL_REPO>-ol9.repo
```
Note:
Before continuing to the next step, ensure that the .repo file doesn't contain the 404 Not Found error. This error implies that the data retrieval failed. This can be caused by an incorrect path and can cause the upgrade to fail.
Perform one of the following steps to initiate or reenter the upgrade or the OS update:

Note:
The upgrade runs an initial cluster test based on the current OCCNE_VERSION. If the initial upgrade cluster test fails, the upgrade.sh script terminates. However, at this point, the upgrade is not started. Therefore, after correcting the issues discovered, you can restart the upgrade using step a. This is not applicable to the usual expected exits in the upgrade.sh script.
1. Run the following command to launch the upgrade script to perform an upgrade to the new version:
```
$ OCCNE_NEW_VERSION=<new_version_tag> upgrade.sh
```
2. Run the following command to launch the upgrade script to perform an OS update and subsequent runs for both upgrade and OS update:
```
$ upgrade.sh
```
When the upgrade process initiates a reboot of the hosting Bastion, the upgrade.sh script terminates gracefully with the following output and the current Bastion is rebooted after a short period.
Sample output:
```
The current Bastion: ${OCCNE_HOSTNAME} will now be going into reboot after configuration changes.
Wait until the reboot completes and follow the documentation to continue the upgrade.
This may take a number of minutes, longer on a major OS upgrade.
```
Once the Bastion recovers from the reboot, rerun upgrade.sh by running the command in Step 3b.
Note:
- Once the upgrade begins on each node (starting with the active Bastion Host), the login Banner in Shell is updated to reflect the following message. This is restored back to the original message when the upgrade completes. This Banner is not set in the current active LBVMs for vCNE when the Banner is set in the other nodes. This is because the current active LBVMs are not upgraded until the end of the procedure.
```
****************************************************************************
|
|   Date/Time: 2024-03-11 14:56:31.612728
|
|   OCCNE UPGRADE TO VERSION: 25.1.101 IN PROGRESS
|   Started Date/Time: 2024-03-15 19:55:22.232178|
|
|   Please discontinue login if not assisting in this maintenance activity.
|
| ****************************************************************************
```
- In some cases, you may see an "Ansible FAILED!" assertion message after the following message. This is an expected behavior where the system tries to return the control to Shell when CNE detects that a reboot will interrupt processing.
```
TASK [staged_reboot : Halt ansible for os_upgrade reboot on current bastion (or its kvm host).  After reboot, reconnect to same bastion, and relaunch upgrade.sh] ***
fatal: [my-cluster-name-bastion-1]: FAILED! => {
    "assertion": false,
    "changed": false,
    "evaluated_to": false,
    "msg": "NOT AN ERROR: This is an EXPECTED assertion to flag self-reboot, and return shell control."
}
 
PLAY RECAP *********************************************************************
my-cluster-name-bastion-1 : ok=70   changed=24   unreachable=0    failed=1    skipped=194  rescued=1    ignored=0
```
- By default, during an upgrade and OS update, the upgrade.sh script exits before rebooting the ACTIVE LBVMs by displaying the following message:
```
Skipping active LBVMs reboot since OCCNE_REBOOT_ACTIVE_LB is not set.
The active LBVMs must be manually rebooted and the upgrade.sh script be run again.
```
  You must then manually reboot (or switch the activity of) each ACTIVE LBVM such that it becomes the STANDBY LBVM. For procedure to perform manual switchover of LBVMs, see the Performing Manual Switchover of LBVMs During Upgrade section. When the switchover of the LBVMs completes successfully, rerun the upgrade.sh script.
- For vCNE upgrades and OS Updates that include LBVMs, the LB Controller Health Check Monitor (HCM) is disabled during the upgrade process. Before the upgrade exits to perform a manual switchover on all LBVM pair, the HCM is restarted on the LB Controller. Due to this, some LBVMs may go into a FAILED state depending on their actual status throughout the upgrade. The system then displays the following sample message and the upgrade exits with an error 3. If you encounter the following error, clean up the LBVMs in the FAILED state and resume the standard upgrade using this procedure.
```
ISSUE: The current number of ACTIVE LBVMs: 2 must be the same as the original number of ACTIVE LBVMs: 3
       List of ACTIVE LBVMs at upgrade start: my_cluster_oam_lbvm1 my_cluster_sig_lbvm1 my_cluster_prov_lbvm1
       List of current ACTIVE LBVMs: my_cluster_oam_lbvm1 my_cluster_sig_lbvm1
       This issue must be resolved before the UPGRADE/OS Update can complete successfully.
```
When the upgrade or OS update completes successfully, the system displays the following message:
Message format for CNE upgrade:
```
<date/time>******** Upgrade Complete ***********
```
For example:
```
03/11/2024 - March Monday 09:27:05 - *********** Upgrade Complete **************
```
Message format for OS update:
```
<date/time>******** OS Update Complete ***********
```
For example:
```
03/11/2024 - March Monday 09:27:05 - ******** OS Update Complete ***********
```

3.6 Performing Upgrade Across Multiple Maintenance Windows

This section describes the procedures to set up and perform an upgrade on CNE environment across multiple maintenance windows.

Note:

This procedure is applicable to standard upgrade and OS update.
CNLB and BareMetal deployments support upgrade or OS Update windows 1 and 2 only.
CNE doesn't support migration from vCNE LBVM to CNLB deployments during an upgrade.
CNE supports only patch level upgrades for CNLB (For example, 24.2.x to 24.2.y).
For a standard OS update on LBVM supported deployments, only two maintenance windows are applicable (windows 1 and 3).

Performing a Standard Upgrade or OS Update Across Multiple Windows

The standard upgrade or OS update can be divided across two or three maintenance windows:

Perform the getdeps, Bastion Host setup, and the OS Upgrade on the Bastion Host.
Perform the K8s, and the Common Services upgrade (this does not apply to the OS Update).
Perform manual LBVM switchover and complete the final stages (POST) of the upgrade (vCNE only).

Prepare for a Standard Upgrade or OS Update

Ensure that all the preupgrade steps are performed.

Perform the Standard Upgrade/OS Update up to the first exit after the Active Bastion reboot (First Maintenance Window)

Perform the Performing an Upgrade procedure to start the standard upgrade from 25.1.100 to 25.1.1xx or the OS update on 25.1.1xx.

Perform the Standard Upgrade after the reboot of the Active Bastion Host (Second Maintenance Window)

Perform the Performing an Upgrade procedure to complete the standard upgrade from 25.1.100 to 25.1.1xx or the OS update on 25.1.1xx.

Perform the standard upgrade manual LBVM switchover (non-CNLB vCNE only) (3rd maintenance window for standard upgrade and 2nd window for OS update)

For vCNE deployments, perform the Performing an Upgrade procedure to complete the standard upgrade or OS Update from 25.1.100 to 25.1.1xx or the OS update on 25.1.1xx.

3.7 Upgrading BareMetal CNE Deployed using Bare Minimum Servers

This section provides the prerequisites and the points to be considered while upgrading a BareMetal CNE that is deployed using bare minimum servers (three worker nodes).

Prerequisites

Before performing an upgrade on a bare minimum setup, ensure that you meet the following prerequisites:

This procedure must be used only when you want to upgrade a BareMetal CNE that is deployed using bare minimum servers (three worker nodes).
Ensure that you perform all the tasks mentioned in the Prerequisites and Preupgrade Tasks sections. However, ignore any prerequisite that states about the minimum required worker servers as CNE supports upgrade for BareMetal deployments deployed using minimal resources (three worker nodes).

If you want to enable CNLB while performing an upgrade on a bare minimum setup, ensure that you meet the following requirements:

Ensure that all nodes (3 controller nodes and 3 worker nodes) are updated to the same level of operating system and Kubernetes components.
Verify that the network and CNLB annotations are defined correctly in the Helm templates. For more information, see Configuring Cloud Native Load Balancer (CNLB).
Use monitoring tools to track the network connectivity, load balancing, and cluster status throughout the upgrade process.
After an upgrade, perform the Postupgrade Tasks to ensure that all applications are functioning as expected with the CNLB setup.

After you ensure that you meet all the prerequisites and requirements, follow the Performing an Upgrade procedure to perform an upgrade. Skip the steps that are specific to vCNE deployments.

3.8 Postupgrade Tasks

This section describes the postupgrade tasks for CNE.

3.8.1 Restoring CNE Customizations

This section provides information about restoring CNE customizations. Ensure that you restore all the customizations applied to the CNE instance after completing the upgrade process.

3.8.1.1 Restoring Prometheus Alert Rules

Perform the following steps to restore Prometheus alert rules:

Use SSH to log in to the active Bastion and run the following command to confirm if it is the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Navigate to the backup-alert-rules directory:
```
$ cd ~/backup-alert-rules
```
Run the following command to restore all the alerts rules that were backed up previously:

Note:
This command can take several minutes to process depending on the number of alert rules to be restored.
```
$ for promrule in *; do kco apply -f "$promrule"; done
```
Sample output:
```
prometheusrule.monitoring.coreos.com/alert created
prometheusrule.monitoring.coreos.com/test created
prometheusrule.monitoring.coreos.com/occne-example created
prometheusrule.monitoring.coreos.com/occne-x created
```

Run the following command to verify if the alert rules are restored:

$ kco get prometheusrules

Sample output:

NAME                   AGE
alert                  4m5s
test                   3m44s
occne-example          11m
occne-alerting-rules   20h
occne-x                3m4s

3.8.1.2 Restoring Grafana Dashboards

Perform the following steps to restore the Grafana dashboard:

Load the previously installed Grafana dashboard.
Click the + icon on the left panel and select Import.

Figure 3-3 Load Grafana Dashboard
Once in new panel, click Upload JSON file. Choose the locally saved dashboard file.

Figure 3-4 Uploading the Dashboard
Repeat the same steps for all the dashboards saved from the older version.

3.8.2 Activating Optional Features

This section provides information about activating optional features, such as Velero and Local DNS post upgrade.

Activating Velero

Velero is used for performing on-demand backups and restore of CNE cluster data. Velero is an optional feature and has extra set of hardware and networking requirements. You can activate Veloro after upgrading CNE. For more information about activating Velero, see Activating Velero.

Activating Local DNS

The Local DNS feature is a reconfiguration of core DNS (CoreDNS) to support external hostname resolution. When Local DNS is enabled, CNE routes the connection to external hosts through core DNS rather than the nameservers on the Bastion Hosts. For information about activating this feature, see the "Activating Local DNS" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

To stop DNS forwarding to Bastion DNS, you must define the DNS details through 'A' records and SRV records. A records and SRV records are added to CNE cluster using Local DNS API calls. For more information about adding and deleting DNS records, see the "Adding and Removing DNS Records" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

Enabling or Disabling Floating IP in OpenStack

Floating IPs are additional public IP addresses that are associated with instances such as control nodes, worker nodes, Bastion Host, and LBVMs. Floating IPs can be quickly re-assigned and switched from one instance to another using API, thereby ensuring high availability and less maintenance. You can activate the Floating IP feature after upgrading CNE. For information about enabling or disabling Floating IP, see Enabling or Disabling Floating IP in OpenStack.

3.8.3 Updating Port Name for servicemonitors and podmonitors

The metric port name on which Prometheus extracts metrics from 5G-CNC applications must be updated to "cnc-metrics".

To update the port name, do the following:

Run the following command to get the servicemonitor details:
```
$ kubectl get servicemonitor -n occne-infra
```
Sample output:
```
NAME                         AGE
occne-nf-cnc-servicemonitor  60m
```

Run the following command to update the port name for servicemonitor:

$ kubectl edit servicemonitor occne-nf-cnc-servicemonitor -n occne-infra
# Edit the above servicemonitor and update the following port name by removing "http" prefix.
 existing port name -
   port: http-cnc-metrics
 updated port name -
   port: cnc-metrics

Save the changes for servicemonitor.
Run the following command to get the podmonitor details:
```
$ kubectl get podmonitor -n occne-infra
```
Sample output:
```
NAME                     AGE
occne-nf-cnc-podmonitor  60m
```

Run the following command to update the port name for podmonitor:

$ kubectl edit podmonitor occne-nf-cnc-podmonitor  -n occne-infra
 
  existing port name -
   port: http-cnc-metrics
  updated port name -
   port: cnc-metrics

Save the changes for podmonitor.

3.8.4 Upgrading Grafana Post Upgrade

This section provides information about upgrading Grafana to a custom version post installation.

After upgrading CNE, depending on your requirement, you can upgrade Grafana to a custom version (For example, 11.2.x). To do so, perform the procedure in the Upgrading Grafana section.

3 Upgrading CNE

3.1 Supported Upgrade Paths

3.2 Prerequisites

3.3 Common Services Release Information

3.4 Preupgrade Tasks

3.4.1 Saving CNE Customizations

3.4.1.1 Preserving Prometheus Alert Rules

3.4.1.2 Preserving Grafana Dashboards

3.4.1.3 Renewing Kubernetes Certificate

3.4.1.4 Patching the Multus DaemonSet

3.4.1.5 Updating OpenSearch Master Node Role

3.4.2 Performing Preupgrade Health Checks

3.4.3 Upgrading Grafana

3.4.4 Checking Preupgrade Config Files

3.4.5 Configuring secrets.ini and occne.ini Files

3.4.6 Checking GRUB Password

3.4.7 Performing a Podman System Reset

3.4.8 Removing VCD variables from occne.ini file

3.5 Performing an Upgrade

3.6 Performing Upgrade Across Multiple Maintenance Windows

3.7 Upgrading BareMetal CNE Deployed using Bare Minimum Servers

3.8 Postupgrade Tasks

3.8.1 Restoring CNE Customizations

3.8.1.1 Restoring Prometheus Alert Rules

3.8.1.2 Restoring Grafana Dashboards

3.8.2 Activating Optional Features

3.8.3 Updating Port Name for servicemonitors and podmonitors

3.8.4 Upgrading Grafana Post Upgrade

3.4.8 Removing VCD variables from `occne.ini` file