Upgrading CNE

3 Upgrading CNE

This chapter provides information about upgrading CNE to the latest release (also referred as a standard upgrade), or updating the Operating System (OS), or both on a given deployment (Bare Metal, OpenStack, or VMware).

Supported Upgrade Paths

The following table lists the supported upgrade paths for CNE:

Table 3-1 Supported Upgrade Paths

Source Release	Target Release
25.1.2xx	25.2.100

Prerequisites

Before upgrading CNE, ensure that you meet the following prerequisites:

The user's central repository is updated with the latest versions of RPMs, binaries, and CNE Images for 25.2.1xx. For more information on how to update RPMs, binaries, and CNE images, see Artifact Acquisition and Hosting.
All Network Functions (NFs) are upgraded before performing a CNE upgrade. For more information about NF upgrade procedure, see Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
The CNE instance that is upgraded has at least the minimum recommended node counts for Kubernetes (that is, three master nodes and six worker nodes).

Note:

Currently, CNE doesn't support rollback in any instances such as:

encountering an error after initiating an upgrade
after a successful upgrade

Caution:

User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content especially when the hyphens or any special characters are part of the copied content.

Common Services Release Information

On successful installation, CNE generates a list of Kubernetes and all the common services release details in a generated files on the Bastion Host. These files are also updated during an upgrade. You can refer to the following files to get the release information after a successful CNE upgrade. The files are available on the Bastion Host in the /var/occne/cluster/${OCCNE_CLUSTER}/artifacts directory:

Kubernetes Release File: K8S_container_images.txt
Common Services Release File: CFG_container_images.txt

Preupgrade Tasks

Before upgrading CNE, perform the tasks described in this section.

Saving CNE Customizations

Before upgrading a CNE instance, you must save all the customizations applied to the CNE instance so that you can reapply them after the upgrade is complete.

Preserving Prometheus Alert Rules

This section provides the steps to back up user-specific Prometheus alert rules, so they can be restored after an upgrade. For more information about restoring Prometheus alert rules using the backups, see Restoring Prometheus Alert Rules.

Use SSH to log in to the active Bastion and run the following command to confirm if it is the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Create the backup-alert-rules directory and navigate to the directory:
```
$ mkdir ~/backup-alert-rules && cd ~/backup-alert-rules
```
Run the following command to backup user-specific alert rules into YAML files, excluding occne-alerting-rules which is the default alert rule:

Note:
Each YAML file is named after the alert rule that is backed up.
```
$ for prom in $(kco get prometheusrules -o jsonpath='{.items[*].metadata.name}'); do [ "$prom" != "occne-alerting-rules" ] && kco get prometheusrules $prom -o yaml > "${prom}.yaml"; done
```
When the backup is complete, you can use the ls command to view the list of backup files created.
For example:
```
$ for prom in $(kco get prometheusrules -o jsonpath='{.items[*].metadata.name}'); do [ "$prom" != "occne-alerting-rules" ] && kco get prometheusrules $prom -o yaml > "${prom}.yaml"; done
$ ls
```
Sample output:
```
alert.yaml  example.yaml  occne-test.yaml
```

Preserving Grafana Dashboards

This section provides the steps to back up user-specific Grafana dashboards to a local directory, so they can be restored after an upgrade. For more information about restoring Grafana dashboards using the backups, see Restoring Grafana Dashboards.

Log in to Grafana GUI.
Select the dashboard to save.
Click Share Dashboard to save the dashboard.

Figure 3-1 Grafana Dashboard
Navigate to the Export tab and click Save to file to save the file in the local repository.

Figure 3-2 Saving the Dashboard in Local Repository
Repeat steps 1 to 4 until you save all the required customer-specific dashboards.

Renewing Kubernetes Certificate

In Oracle Cloud Native Environment (CNE), the Kubernetes control plane certificates (example, for the API server, controller manager, scheduler, and so on) are not automatically renewed on a regular schedule unless there is a Kubernetes version upgrade (uplift) occurs.

If the CNE upgrade includes a Kubernetes version uplift for example, upgrading from Kubernetes 1.27.x to 1.28.x, the control plane certificates are automatically renewed as part of the upgrade process.
If the CNE upgrade does not include a Kubernetes version uplift for example, upgrading from CNE 24.2.x to CNE 24.2.x build, where the Kubernetes version stays the same then, Kubernetes control plane certificates are not automatically renewed during the upgrade process.

If certificate renewal is required but no Kubernetes version uplift is planned then the control plane certificate renewal must be performed manually by following the procedure provided in the "Renewing Kubernetes Certificates" section in Oracle Cloud Native Core, Cloud Native Environment User Guide.

Patching the Multus DaemonSet

Note:

This section is applicable only for CNLB enabled CNE deployment.

Multus is a container network interface (CNI) plugin for Kubernetes that enables attaching multiple network interfaces to pods. Multus can be deployed by simply applying the thick DaemonSet with kubectl.

This section provides steps to patch the init containers of Multus DaemonSet and perform a rollout restart on the pods.

Note:

This patch is applicable only if Multus-Thick Daemonset is already installed and its name is kube-multus-ds. If the DaemonSet has a different name (for example, kube-multus-ds-amd64 or similar), this patch cannot be applied.

From Bastion Host of CNE deployed environment, perform the following steps:

Run the following command to check the name of the Multus DaemonSet:
```
kubectl -n kube-system get daemonset | grep kube-multus
```
If the name is kube-multus-ds, proceed with the following steps, else skip the patch.
Run the following command to patch the daemonset to enable thick mode:
```
kubectl patch daemonset kube-multus-ds -n kube-system --type='json' -p='[
  {
    "op": "replace",
    "path": "/spec/template/spec/initContainers/0/command",
    "value": ["/usr/src/multus-cni/bin/install_multus", "-d", "/host/opt/cni/bin", "-t", "thick"]
  }
]'
```
If the above command reports that no changes were made, this message can be safely ignored.
Run the following command to restart the DaemonSet to activate the patch.
```
kubectl -n kube-system rollout restart ds kube-multus-ds
```
(Optional step) Run the following command to monitor the rollout status:
```
kubectl -n kube-system rollout status ds kube-multus-ds
```

Updating OpenSearch Master Node Role

This section provides information about updating OpenSearch master node role.

Check if the current role is set to master,data:

$ kubectl -n occne-infra get sts occne-opensearch-cluster-master -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="node.roles")]}'

Sample output:

{"name":"node.roles","value":"master,data"}

If the current role is master,data, then exclude master nodes from the shard allocation.

Run the following command to prevent shards from being allocated to the master nodes before changing their roles:

$ kubectl -n occne-infra exec occne-opensearch-cluster-client-0 -c opensearch -- \
curl -s -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient": {
    "cluster.routing.allocation.exclude._name": "occne-opensearch-cluster-master-0,occne-opensearch-cluster-master-1,occne-opensearch-cluster-master-2"
  }
}'

Remove the data role from the master nodes.

Run the following command to update the StatefulSet to assign only the master role to each master node:

$ kubectl -n occne-infra get sts occne-opensearch-cluster-master -o json | \
jq '(.spec.template.spec.containers[0].env[] | select(.name == "node.roles")).value = "master"' | \
kubectl apply -f -; echo

Monitor the shard reallocation across the data nodes.

Run the following command to monitor the shard distribution across the data nodes:

$ kubectl -n occne-infra exec occne-opensearch-cluster-client-0 -c opensearch -- \
curl -s localhost:9200/_cat/shards?v

Sample output:

index               shard prirep state   docs   store ip            node
logstash-2025.01.16 0     p      STARTED 351195 80.9mb 10.233.109.81 occne-opensearch-cluster-data-0
logstash-2025.01.16 0     r      STARTED 351195 80.9mb 10.233.93.77  occne-opensearch-cluster-data-2

Rerun the following command until all the shards are in "started" state and are allocated on data nodes:
```
$ kubectl -n occne-infra exec occne-opensearch-cluster-client-0 -c opensearch -- \
curl -s localhost:9200/_cat/shards?v
```

Validate the health status of the OpenSearch cluster. Ensure the cluster health status is "green" before proceeding.

Run the following command to validate the health status of the OpenSearch cluster:

$ kubectl -n occne-infra exec occne-opensearch-cluster-client-0 -c opensearch -- \
curl -s localhost:9200/_cluster/health?pretty

Sample output:

{
  "cluster_name" : "occne-opensearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 9,
  "number_of_data_nodes" : 6,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 25,
  "active_shards" : 50,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Performing Preupgrade Health Checks

Perform the following steps to ensure that cluster is in healthy state.

Check drive space on Bastion Host

Before upgrading, ensure that there's sufficient drive space at the home directory of the user (usually admusr for Bare Metal and cloud-user for vCNE), where the upgrade runs.

The df -h command in the /home directory ((df -h /home)) must have at least 4 GB and /var directory ((df -h /var)) must have at least 10 GB free space for temporary gathering and operation of the CNE containers when the upgrade procedure runs.

If there is insufficient space, then free up some space. One common location to reclaim space is the podman image storage for local images. You can find the local images using the 'podman image ls' command, and remove them by using the 'podman image rm -f [image]' command. You can reclaim additional podman space by using the 'podman system prune -fa' command to remove any unreferenced image layers.

Note:

When the upgrade.sh script is run, it performs the space checks and exits the upgrade if there is insufficient space in the two directories.

Check OpenSearch pods disk space

For OpenSearch pods that are using PVCs, check available disk space and confirm that there is at least 1 GB of disk space available before running the upgrade.

kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-1 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-2 -c opensearch -- df -h /usr/share/opensearch/data

For example:

$ kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data

Sample output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        9.8G  5.3G  4.1G  57% /usr/share/opensearch/data

Upgrading Grafana

This section details the procedure to upgrade Grafana to a custom version.

Note:

This procedure is optional and can be run if you want to upgrade Grafana to a custom version.
This procedure is applicable to both BareMetal and vCNE deployments.

Limitations

This procedure is only used to upgrade from Grafana release 9.5.3 to 11.2.x.
Grafana version 11.2.x is not tested with CNE. If you are upgrading to Grafana 11.2.x, ensure that you manage, adapt and maintain the version.
Plugin installation and Helm chart adaptation are not in the purview of this procedure.
Some versions of Grafana image may try to pull certificates from the internet.

Prerequisites

Before running the procedure, ensure that you meet the following prerequisites:

The cluster must run a stable Grafana version. Most CNE clusters run with the version 9.5.3 which is acceptable.
This procedure must be run in the active Bastion Host.
The target version of Grafana must available in the cluster. This can be achieved by pulling the required version from the desired repository.
Podman must be installed and you must be able to run the Podman commands.
Upgrade Helm to the minimum supported Helm version (3.15.2 or more).
kubectl must be installed.

Procedure

Log in to the active Bastion Host and run the following command to ensure that you are logged in to the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```

Ensure the desired Grafana image is present in the podman registry:

$ podman image ls

Sample output:

REPOSITORY                       TAG                           IMAGE ID      CREATED      SIZE
winterfell:5000/occne/provision  25.2.0-alpha.0-11-g647fa73e6  04e905388051  3 days ago   2.48 GB
localhost/grafana/grafana        11.2.5                        37c12d738603  6 weeks ago  469 MB

Tag and push the image to follow CNE image naming convention. This is done to ensure that the repository has the correct naming convention after pulling the desired image version.

$ podman tag <CURRENT_GRAFANA_IMAGE_NAME>:<CURRENT_TAG> occne-repo-host:5000/occne/<DESIRED_GRAFANA_NAME>:<CURRENT_TAG>
$ podman push occne-repo-host:5000/occne/<DESIRED_GRAFANA_NAME>:<CURRENT_TAG>

For example:

$ podman tag localhost/grafana/grafana:11.2.5 occne-repo-host:5000/occne/grafana:11.2.5
$ podman push occne-repo-host:5000/occne/grafana:11.2.5

Review all the deployments on the cluster and search for the Grafana deployment:

$ kubectl -n occne-infra get deploy

Sample output:

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
cnlb-app                                   4/4     4            4           6h59m
cnlb-manager                               1/1     1            1           6h59m
occne-alertmanager-snmp-notifier           1/1     1            1           6h54m
occne-bastion-controller                   1/1     1            1           6h54m
occne-kube-prom-stack-grafana              1/1     1            1           6h55m # HERE IS THE GRAFANA DEPLOYMENT
occne-kube-prom-stack-kube-operator        1/1     1            1           6h55m
occne-kube-prom-stack-kube-state-metrics   1/1     1            1           6h55m
occne-metrics-server                       1/1     1            1           6h54m
occne-opensearch-dashboards                1/1     1            1           6h55m
occne-promxy                               1/1     1            1           6h54m
occne-promxy-apigw-nginx                   2/2     2            2           6h54m
occne-tracer-jaeger-collector              1/1     1            1           6h54m
occne-tracer-jaeger-query                  1/1     1            1           6h54m

Edit the occne-kube-prom-stack-grafana deployment. This opens an editable YAML file where you can loacate the previous Grafana image.

$ kubectl -n occne-infra edit deploy occne-kube-prom-stack-grafana

Sample output:

...
        - name: GF_PATHS_DATA
          value: /var/lib/grafana/
        - name: GF_PATHS_LOGS
          value: /var/log/grafana
        - name: GF_PATHS_PLUGINS
          value: /var/lib/grafana/plugins
        - name: GF_PATHS_PROVISIONING
          value: /etc/grafana/provisioning
        image: occne-repo-host:5000/docker.io/grafana/grafana:9.5.3               # HERE IS THE IMAGE
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
...

Replace the old image with the recently pushed image:

For example:

...
        - name: GF_PATHS_DATA
          value: /var/lib/grafana/
        - name: GF_PATHS_LOGS
          value: /var/log/grafana
        - name: GF_PATHS_PLUGINS
          value: /var/lib/grafana/plugins
        - name: GF_PATHS_PROVISIONING
          value: /etc/grafana/provisioning
        image: occne-repo-host:5000/occne/grafana:11.2.5               # HERE IS THE IMAGE
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
...

Run the following command to verify the pods' health. Ensure that all pods are in the healthy Running state with no resets.
```
$ kco get pods | grep grafana
```
Sample output:
```
occne-kube-prom-stack-grafana-7ccf687579-ns94w             3/3     Running     0              7h18m
```
Run the following command to verify the pods internal logs. Use the pod name obtained from the previous step.
```
$ kubectl -n occne-infra logs <YOUR_GRAFANA_POD>
```
For example:
```
$ kubectl -n occne-infra logs occne-kube-prom-stack-grafana-7ccf687579-ns94w
```

Depending on the type of Load Balancer used, use one of the following steps to retrieve Grafana external IP:

If you are using LBVM, run the following command to extract the external IP:

[cloud-user@occne1-<user-name>-bastion-1 ~]$ kubectl -n occne-infra get service | grep grafana -o wide

Sample output:

NAME                                             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                           AGE
occne-kube-prom-stack-grafana                    LoadBalancer   10.233.42.123   10.75.200.32    80:30553/TCP                                      4d21h

If you are using LBVM, use the occne.ini file to extract the external IP:

$ cat /var/occne/cluster/$OCCNE_CLUSTER/occne.ini | grep occne_graf_cnlb

Sample output:

occne_graf_cnlb = 10.75.200.32   
 #In both of these examples the external ip is -   10.75.200.32

Ensure that the Grafana dashboard is accessible by either pinging the Grafana external IP or accessing the dashboard in a browser.
The following code block provides the command to ping the external IP:
```
$ ping <YOUR-GRAFANA-EXTERNAL-IP>
```
For example:
```
$ ping 10.75.200.32
```
Sample output:
```
PING  10.75.200.32    ( 10.75.200.32    ) 56(84) bytes of data.
64 bytes from  10.75.200.32    : icmp_seq=1 ttl=62 time=3.04 ms
64 bytes from  10.75.200.32    : icmp_seq=2 ttl=62 time=1.63 ms
64 bytes from  10.75.200.32    : icmp_seq=3 ttl=62 time=1.24 ms
```
The following code block provides the CURL command to access Grafana dashboard using the external IP:
```
$ curl <YOUR-GRAFANA-EXTERNAL-IP>
```
For example:
```
$ curl 10.75.225.166
```
Sample output:
```
<a href="/occne1-<user-name>/grafana/login">Found</a>.
```

Checking Preupgrade Config Files

Check manual updates on the pod resources: Check the manual updates made to the Kubernetes cluster configuration such as deployments and daemonsets, after the initial deployment, are configured in the proper occne.ini (vCNE) or hosts.ini (Bare Metal) file. For more information, see Preinstallation Tasks.

Configuring secrets.ini and occne.ini Files

This section explains about creating and configuring the secrets.ini file and removing vars from occne.ini file.

The secrets.ini file contains information on the cluster credentials and cloud specific accounts (Openstack, VMware, BareMetal) credentials which are required to allow the cluster to run correctly.

Perform the following procedure to create a secrets.ini file and remove vars from the occne.ini file:

Create a copy of the secrets.ini file from the secrets.ini.template file in the cluster directory.

Note:
If the secrets.ini.template file is not located in the cluster directory, use the templates given in step 2.

Edit the copy of the secrets.ini file and fill out all the required parameters depending on the platform being upgraded.

secrets.ini parameters

Example for vCNE OpenStack:

[occne:vars]
# Set grub password
occne_grub_password=
 
[openstack:vars]
# Specify the user/pass of the External OpenStack Controller/CSI Cinder plugin accounts needed
# for deployment.
external_openstack_username=
external_openstack_password=
 
cinder_username=
cinder_password=

Example for vCNE VCD:

[occne:vars]
# Set grub password
occne_grub_password=
 
[vcd:vars]
## Specify the vSphere information of the External vSphere Controller/CSI Cinder plugin accounts
## needed for deployment.
external_vsphere_user =
external_vsphere_password =
 
vcd_user=
vcd_passwd=

Example for BareMetal:

[occne:vars]
# Set grub password
occne_grub_password=
 
####
# PXE Settings
pxe_install_lights_out_usr=
pxe_install_lights_out_passwd=
 
 
### ANY OTHER SENSITIVE CREDENTIAL.
# ...

Remove the variables from the occne.ini file for vCNE and hosts.ini file for the BareMetal deployments.
- Edit the occne.ini file:
  For vCNE, run the following command:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini
```
  For BareMetal, run the following command:
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/hosts.ini
```
- Remove the variables from the occne.ini and host.ini files that were added in the secrets.ini file.

Checking GRUB Password

This section provides information about configuring Grand Unified Bootloader (GRUB) password before performing an upgrade.

It is mandatory to set the GRUB password before performing an upgrade from 25.1.2xx to 25.2.1xx.

You can set the GRUB password by adding the occne_grub_password=<password> variable under the occne:vars section header in occne.ini (for vCNE) or hosts.ini (for BareMetal).

Once the variable occne_grub_password is set, the value must be updated only if you want to change the GRUB password. The upgrade script checks for this variable in the respective .ini file and fails if it is not set. For more information about configuring GRUB password, see Configuring GRUB Password

Performing a Podman System Reset

This section explains about how to perform a podman system reset.

Note:

It is recommonded to run the following command on the Bastion Host on which the upgrade is being run. This ensures there is no leftover image corruption on the Bastion

podman system reset

When prompted, select the Yes option in the next window.

Updating Network Attachment Definition Plugin

Note:

This step is applicable for vCNE CNLB OpenStack deployments only.

Before upgrading the cluster, make sure the external network attachment definition plugin is correctly set for this release.

Open the cnlb.ini file.

vi /var/occne/cluster/${OCCNE_CLUSTER}/cnlb.ini

Set the ext_net_attach_def_plugin value to macvlan.

Example:

[cnlb]
...
[cnlb:vars]
...
ext_net_attach_def_plugin = "macvlan"
...

Save and close the file.

Verify the setting.

grep ext_net_attach_def_plugin /var/occne/cluster/${OCCNE_CLUSTER}/cnlb.ini

Sample output:

ext_net_attach_def_plugin = "macvlan"

Note:

Ensure to perform this step, else the upgrade.sh script will fail with a validation error. You can resume the upgrade process once the value has been correctly configured.

Ensuring Policies in Neutron Allow Port Update

Before upgrading the cluster, ensure that Neutron's policy settings allow port updates. In previous CNE versions, ports were created using security groups, and port security was enabled. To support new features (Conntrackd, regardless of whether it's enabled or disabled), this setting has changed. By default, Neutron's policies will prevent OpenTofu from making the necessary changes causing the upgrade to fail.

Contact the OpenStack administrator and ask to perform or confirm that one of the following scenarios is true:

Ensure that the OpenStack username used for the upgrade has the following policy assigned:
- rule:update_port
- rule:update_port:port_security_enabled
- rule:update_port:allowed_address_pairs
Ensure that at the project level, Newtron allows the following policies:
- rule:update_port
- rule:update_port:port_security_enabled
- rule:update_port:allowed_address_pairs

Note:

If this step is not completed, the upgrade.sh script will fail during the OpenTofu apply process. You can resume the upgrade after applying one of the above configurations.

Note:

The permissions mentioned above are the minimum required for OpenTofu to make the changes necessary for a successful update. Depending on the specific cloud configuration, additional rules or the removal of existing rules that restrict certain permissions may be necessary. Contact OpenStack administrator for specific advice.

Removing VCD Variables from the `occne.ini` File

VMWare Cloud Director (VCD) controllers and plugins versions are now managed exclusively by Kubespray. The following procedure removes these variables from the occne.ini file to avoid overwriting newer versions.

Note:

The following steps must be performed for VMWare clusters. Failure to remove VCD variables from the occne.ini file will block cluster from obtaining the latest vSphere CSI images or versions, causing potential compatibility issues.

Perform the following procedure to remove the VCD variables from the occne.ini file:

Open the occne.ini file in a text editor.
```
$ vi /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini
```

Remove only the following lines from the occne.ini file. These lines are at the end of the occne.ini file.

Following are the custom CSI Driver version for vSphere (VCD) variables to be removed.

vsphere_csi_attacher_image_tag = v4.8.1
vsphere_csi_resizer_tag = v1.12.0
vsphere_csi_controller = v3.5.0
vsphere_csi_driver_image_tag = v3.5.0
vsphere_syncer_image_tag = v3.5.0
vsphere_csi_liveness_probe_image_tag = v2.15.0
vsphere_csi_provisioner_image_tag = v4.0.1
vsphere_csi_node_driver_registrar_image_tag = v2.13.0
vsphere_csi_snapshotter_image_tag = v8.2.0

Following are the custom Cloud Controller Manager for vSphere (VCD) variables to be removed.

external_vsphere_cloud_controller_image_tag = v1.33.0

Save the file and close the editor.

Verify that VCD variables are not present in the occne.ini file. The following commands must not print any output.

$ cat /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini | grep vsphere_csi
$ cat /var/occne/cluster/${OCCNE_CLUSTER}/occne.ini | grep external_vsphere_cloud_controller_image_tag

Performing an Upgrade

This section describes the procedure to perform a standard upgrade, OS update, or both on a given CNE deployment (BareMetal or vCNE).

Note:

This upgrade is only used to upgrade from release 25.1.2xx to release 25.2.1xx.
Ensure that you complete all the preupgrade procedures before performing the upgrade.
It is suggested to run this procedure from a terminal emulator (like tmux) on the machine being used to sign-in to the Bastion Host. This way, the Bastion Host bash shell continues to run even in case of shell and VPN disconnections.
It is suggested to use a session capture program (such as, Script) on the Bastion Host to capture all input and output for diagnosing issues. This program must be rerun for each login.
Initiate the upgrade.sh script from the active Bastion Host. However, during most of the upgrades, there is no designated active Bastion Host as the system changes continuously. Therefore, ensure that you run the upgrade.sh script from the same Bastion Host that was used initially.
The upgrade procedure can take hours to complete and the total time depends on the configuration of the cluster.
Before performing a standard upgrade or OS update, verify the health of the cluster and the services related to CNE.
Performing a standard upgrade, OS update. or both causes the current Bastion Host to reboot multiple times. Each time the upgrade.sh script terminates without indicating an error condition, rerun the upgrade.sh script using this procedure on the same Bastion Host after it reboots.
Before performing an upgrade, ensure that all CA certificates are up to date.

WARNING:

Refrain from performing a controlled abort (ctrl-C) on the upgrade while it is in progress. Allow the upgrade to exit gracefully from an error condition or after a successful completion.

Log Files for Debugging Upgrade

The system generates many log files during the upgrade or OS update process. All the log files are suffixed with a date and timestamp. These files are maintained in the /var/occne/cluster/<cluster_name>/upgrade/logs directory and can be removed after the upgrade or OS update completes successfully. For any issues encountered during the upgrade, these files must be collected into a tar file and made available to the next level of support for debugging.

Procedure

Use SSH to log in to the active Bastion Host. You can determine the active Bastion Host by running the following command:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Note:
If you are rerunning the upgrade script after an error or termination, log in to the same Bastion Host that was used during the initial run.
Ensure that the new central repository OLX yum repository file is present in the /var/occne/yum.repos.d/ directory and the format of the file name is <CENTRAL_REPO>-olx.repo, where x is the version number (For example, <CENTRAL_REPO>-ol9.repo):
For example:
```
$ curl http://<CENTRAL_REPO>/<path_to_file>/<CENTRAL_REPO>-ol9.repo -o /var/occne/yum.repos.d/<CENTRAL_REPO>-ol9.repo
```
Note:
Before continuing to the next step, ensure that the .repo file doesn't contain the 404 Not Found error. This error implies that the data retrieval failed. This can be caused by an incorrect path and can cause the upgrade to fail.
Perform one of the following steps to initiate or reenter the upgrade or the OS update:

Note:
The upgrade runs an initial cluster test based on the current OCCNE_VERSION. If the initial upgrade cluster test fails, the upgrade.sh script terminates. However, at this point, the upgrade is not started. Therefore, after correcting the issues discovered, you can restart the upgrade using step a. This is not applicable to the usual expected exits in the upgrade.sh script.
1. Run the following command to launch the upgrade script to perform an upgrade to the new version:
```
$ OCCNE_NEW_VERSION=<new_version_tag> upgrade.sh
```
2. Run the following command to launch the upgrade script to perform an OS update and subsequent runs for both upgrade and OS update:
```
$ upgrade.sh
```
When the upgrade process initiates a reboot of the hosting Bastion, the upgrade.sh script terminates gracefully with the following output and the current Bastion is rebooted after a short period.
Sample output:
```
The current Bastion: ${OCCNE_HOSTNAME} will now be going into reboot after configuration changes.
Wait until the reboot completes and follow the documentation to continue the upgrade.
This may take a number of minutes, longer on a major OS upgrade.
```
Once the Bastion recovers from the reboot, rerun upgrade.sh by running the command in Step 3b.
Note:
- Once the upgrade begins on each node (starting with the active Bastion Host), the login Banner in Shell is updated to reflect the following message. This is restored back to the original message when the upgrade completes. This Banner is not set in the current active LBVMs for vCNE when the Banner is set in the other nodes. This is because the current active LBVMs are not upgraded until the end of the procedure.
```
****************************************************************************
|
|   Date/Time: 2024-03-11 14:56:31.612728
|
|   OCCNE UPGRADE TO VERSION: 25.2.100 IN PROGRESS
|   Started Date/Time: 2024-03-15 19:55:22.232178|
|
|   Please discontinue login if not assisting in this maintenance activity.
|
| ****************************************************************************
```
- In some cases, you may see an "Ansible FAILED!" assertion message after the following message. This is an expected behavior where the system tries to return the control to Shell when CNE detects that a reboot will interrupt processing.
```
TASK [staged_reboot : Halt ansible for os_upgrade reboot on current bastion (or its kvm host).  After reboot, reconnect to same bastion, and relaunch upgrade.sh] ***
fatal: [my-cluster-name-bastion-1]: FAILED! => {
    "assertion": false,
    "changed": false,
    "evaluated_to": false,
    "msg": "NOT AN ERROR: This is an EXPECTED assertion to flag self-reboot, and return shell control."
}
 
PLAY RECAP *********************************************************************
my-cluster-name-bastion-1 : ok=70   changed=24   unreachable=0    failed=1    skipped=194  rescued=1    ignored=0
```
- By default, during an upgrade and OS update, the upgrade.sh script exits before rebooting the ACTIVE LBVMs by displaying the following message:
```
Skipping active LBVMs reboot since OCCNE_REBOOT_ACTIVE_LB is not set.
The active LBVMs must be manually rebooted and the upgrade.sh script be run again.
```
  You must then manually reboot (or switch the activity of) each ACTIVE LBVM such that it becomes the STANDBY LBVM. For procedure to perform manual switchover of LBVMs, see the Performing Manual Switchover of LBVMs During Upgrade section. When the switchover of the LBVMs completes successfully, rerun the upgrade.sh script.
- For vCNE upgrades and OS Updates that include LBVMs, the LB Controller Health Check Monitor (HCM) is disabled during the upgrade process. Before the upgrade exits to perform a manual switchover on all LBVM pair, the HCM is restarted on the LB Controller. Due to this, some LBVMs may go into a FAILED state depending on their actual status throughout the upgrade. The system then displays the following sample message and the upgrade exits with an error 3. If you encounter the following error, clean up the LBVMs in the FAILED state and resume the standard upgrade using this procedure.
```
ISSUE: The current number of ACTIVE LBVMs: 2 must be the same as the original number of ACTIVE LBVMs: 3
       List of ACTIVE LBVMs at upgrade start: my_cluster_oam_lbvm1 my_cluster_sig_lbvm1 my_cluster_prov_lbvm1
       List of current ACTIVE LBVMs: my_cluster_oam_lbvm1 my_cluster_sig_lbvm1
       This issue must be resolved before the UPGRADE/OS Update can complete successfully.
```
When the upgrade or OS update completes successfully, the system displays the following message:
Message format for CNE upgrade:
```
<date/time>******** Upgrade Complete ***********
```
For example:
```
03/11/2024 - March Monday 09:27:05 - *********** Upgrade Complete **************
```
Message format for OS update:
```
<date/time>******** OS Update Complete ***********
```
For example:
```
03/11/2024 - March Monday 09:27:05 - ******** OS Update Complete ***********
```

Performing Upgrade Across Multiple Maintenance Windows

This section describes the procedures to set up and perform an upgrade on CNE environment across multiple maintenance windows.

Note:

This procedure is applicable to standard upgrade and OS update.
CNLB and BareMetal deployments support upgrade or OS Update windows 1 and 2 only.
CNE doesn't support migration from vCNE LBVM to CNLB deployments during an upgrade.
CNE supports only patch level upgrades for CNLB (For example, 24.2.x to 24.2.y).
For a standard OS update on LBVM supported deployments, only two maintenance windows are applicable (windows 1 and 3).

Performing a Standard Upgrade or OS Update Across Multiple Windows

The standard upgrade or OS update can be divided across two or three maintenance windows:

Perform the getdeps, Bastion Host setup, and the OS Upgrade on the Bastion Host.
Perform the K8s, and the Common Services upgrade (this does not apply to the OS Update).
Perform manual LBVM switchover and complete the final stages (POST) of the upgrade (vCNE only).

Prepare for a Standard Upgrade or OS Update

Ensure that all the preupgrade steps are performed.

Perform the Standard Upgrade/OS Update up to the first exit after the Active Bastion reboot (First Maintenance Window)

Perform the Performing an Upgrade procedure to start the standard upgrade from 25.1.2xx to 25.2.1xx or the OS update on 25.2.1xx.

Perform the Standard Upgrade after the reboot of the Active Bastion Host (Second Maintenance Window)

Perform the Performing an Upgrade procedure to complete the standard upgrade from 25.1.2xx to 25.2.1xx or the OS update on 25.2.1xx.

Perform the standard upgrade manual LBVM switchover (non-CNLB vCNE only) (3rd maintenance window for standard upgrade and 2nd window for OS update)

For vCNE deployments, perform the Performing an Upgrade procedure to complete the standard upgrade or OS Update from 25.1.2xx to 25.2.1xx or the OS update on 25.2.1xx.

Upgrading BareMetal CNE Deployed using Bare Minimum Servers

This section provides the prerequisites and the points to be considered while upgrading a BareMetal CNE that is deployed using bare minimum servers (three worker nodes).

Prerequisites

Before performing an upgrade on a bare minimum setup, ensure that you meet the following prerequisites:

This procedure must be used only when you want to upgrade a BareMetal CNE that is deployed using bare minimum servers (three worker nodes).
Ensure that you perform all the tasks mentioned in the Prerequisites and Preupgrade Tasks sections. However, ignore any prerequisite that states about the minimum required worker servers as CNE supports upgrade for BareMetal deployments deployed using minimal resources (three worker nodes).

If you want to enable CNLB while performing an upgrade on a bare minimum setup, ensure that you meet the following requirements:

Ensure that all nodes (3 controller nodes and 3 worker nodes) are updated to the same level of operating system and Kubernetes components.
Verify that the network and CNLB annotations are defined correctly in the Helm templates. For more information, see Configuring Cloud Native Load Balancer (CNLB).
Use monitoring tools to track the network connectivity, load balancing, and cluster status throughout the upgrade process.
After an upgrade, perform the Postupgrade Tasks to ensure that all applications are functioning as expected with the CNLB setup.

After you ensure that you meet all the prerequisites and requirements, follow the Performing an Upgrade procedure to perform an upgrade. Skip the steps that are specific to vCNE deployments.

Upgrading CNE Upto Two Releases (N+2)

CNE clusters can be upgraded up to two full versions (N+2) in a single step, minimizing downtime compared to performing two consecutive upgrades.

For example, if your goal is to upgrade from version 25.2.100 to “N+2”, it means “N” represents your current version of the product and “+2” means you can upgrade directly to the version that is two releases ahead of your current one.

Prerequisites

N+2 upgrades are supported only on CNLB clusters.
N+2 upgrade functionality is introduced in version 25.2.100. Therefore, the cluster must be running the version 25.1.100 or later before initiating an N+2 upgrade.
Ensure that all the prerequisites given in the Upgrading CNE section are completed prior to starting this procedure.

N+2 Upgrade Policy

With the introduction of the N+2 upgrade feature in the 25.2.100 version, the upgrade script now includes a validator mechanism. This validator ensures that you are upgrading the cluster to a valid target version.

Table 3-2 N+2 Upgrade Use Cases

Actual Cluster Version	Update Cluster	Minor Upgrade	Standard Upgrade	N+2 Upgrade
25.1.100	25.1.100	25.1.101	25.1.200	25.2.100
25.1.200	25.1.200	25.1.201	25.2.100	25.2.200

Note:

Since the N+2 upgrade feature was introduced in 25.2.100, the cluster version prior to performing an N+2 upgrade must be 25.1.100 or later. Do not attempt to run an N+2 upgrade from versions earlier than 25.1.100.

Performing N+2 Upgrade

Perform the following steps to upgrade to a N+2 CNE version:

Run the following command to check the current cluster version:
```
$ env | grep OCCNE_VERSION
```
Sample output:
```
OCCNE_VERSION=25.1.100
```
To perform an N+2 upgrade, follow the Performing an Upgrade section. When you run the upgrade, set the OCCNE_NEW_VERSION variable to a version that is up to two versions ahead of your current cluster version.
```
$ OCCNE_NEW_VERSION=<Two versions above the actual cluster version> upgrade.sh
```
Example:
```
$ env | grep OCCNE_VERSION
OCCNE_VERSION=25.1.100
$ OCCNE_NEW_VERSION=25.2.100 upgrade.sh
```
The N+2 upgrade process is the same as a regular upgrade. Run the script with the version you want to upgrade to, ensuring it meets the N+2 version policy.

Postupgrade Tasks

This section describes the postupgrade tasks for CNE.

Restoring CNE Customizations

This section provides information about restoring CNE customizations. Ensure that you restore all the customizations applied to the CNE instance after completing the upgrade process.

Restoring Prometheus Alert Rules

Perform the following steps to restore Prometheus alert rules:

Use SSH to log in to the active Bastion and run the following command to confirm if it is the active Bastion:
```
$ is_active_bastion
```
Sample output:
```
IS active-bastion
```
Navigate to the backup-alert-rules directory:
```
$ cd ~/backup-alert-rules
```
Run the following command to restore all the alerts rules that were backed up previously:

Note:
This command can take several minutes to process depending on the number of alert rules to be restored.
```
$ for promrule in *; do kco apply -f "$promrule"; done
```
Sample output:
```
prometheusrule.monitoring.coreos.com/alert created
prometheusrule.monitoring.coreos.com/test created
prometheusrule.monitoring.coreos.com/occne-example created
prometheusrule.monitoring.coreos.com/occne-x created
```

Run the following command to verify if the alert rules are restored:

$ kco get prometheusrules

Sample output:

NAME                   AGE
alert                  4m5s
test                   3m44s
occne-example          11m
occne-alerting-rules   20h
occne-x                3m4s

Restoring Grafana Dashboards

Perform the following steps to restore the Grafana dashboard:

Load the previously installed Grafana dashboard.
Click the + icon on the left panel and select Import.

Figure 3-3 Load Grafana Dashboard
Once in new panel, click Upload JSON file. Choose the locally saved dashboard file.

Figure 3-4 Uploading the Dashboard
Repeat the same steps for all the dashboards saved from the older version.

Activating Optional Features

This section provides information about activating optional features, such as Velero and Local DNS post upgrade.

Activating Velero

Velero is used for performing on-demand backups and restore of CNE cluster data. Velero is an optional feature and has extra set of hardware and networking requirements. You can activate Veloro after upgrading CNE. For more information about activating Velero, see Activating Velero.

Activating Local DNS

The Local DNS feature is a reconfiguration of core DNS (CoreDNS) to support external hostname resolution. When Local DNS is enabled, CNE routes the connection to external hosts through core DNS rather than the nameservers on the Bastion Hosts. For information about activating this feature, see the "Activating Local DNS" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

To stop DNS forwarding to Bastion DNS, you must define the DNS details through 'A' records and SRV records. A records and SRV records are added to CNE cluster using Local DNS API calls. For more information about adding and deleting DNS records, see the "Adding and Removing DNS Records" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.

Enabling or Disabling Floating IP in OpenStack

Floating IPs are additional public IP addresses that are associated with instances such as control nodes, worker nodes, Bastion Host, and LBVMs. Floating IPs can be quickly re-assigned and switched from one instance to another using API, thereby ensuring high availability and less maintenance. You can activate the Floating IP feature after upgrading CNE. For information about enabling or disabling Floating IP, see Enabling or Disabling Floating IP in OpenStack.

Updating Port Name for servicemonitors and podmonitors

The metric port name on which Prometheus extracts metrics from 5G-CNC applications must be updated to "cnc-metrics".

To update the port name, do the following:

Run the following command to get the servicemonitor details:
```
$ kubectl get servicemonitor -n occne-infra
```
Sample output:
```
NAME                         AGE
occne-nf-cnc-servicemonitor  60m
```

Run the following command to update the port name for servicemonitor:

$ kubectl edit servicemonitor occne-nf-cnc-servicemonitor -n occne-infra
# Edit the above servicemonitor and update the following port name by removing "http" prefix.
 existing port name -
   port: http-cnc-metrics
 updated port name -
   port: cnc-metrics

Save the changes for servicemonitor.
Run the following command to get the podmonitor details:
```
$ kubectl get podmonitor -n occne-infra
```
Sample output:
```
NAME                     AGE
occne-nf-cnc-podmonitor  60m
```

Run the following command to update the port name for podmonitor:

$ kubectl edit podmonitor occne-nf-cnc-podmonitor  -n occne-infra
 
  existing port name -
   port: http-cnc-metrics
  updated port name -
   port: cnc-metrics

Save the changes for podmonitor.

Upgrading Grafana Post Upgrade

This section provides information about upgrading Grafana to a custom version post installation.

After upgrading CNE, depending on your requirement, you can upgrade Grafana to a custom version (For example, 11.2.x). To do so, perform the procedure in the Upgrading Grafana section.

3 Upgrading CNE

Supported Upgrade Paths

Prerequisites

Common Services Release Information

Preupgrade Tasks

Saving CNE Customizations

Preserving Prometheus Alert Rules

Preserving Grafana Dashboards

Renewing Kubernetes Certificate

Patching the Multus DaemonSet

Updating OpenSearch Master Node Role

Performing Preupgrade Health Checks

Upgrading Grafana

Checking Preupgrade Config Files

Configuring secrets.ini and occne.ini Files

Checking GRUB Password

Performing a Podman System Reset

Updating Network Attachment Definition Plugin

Ensuring Policies in Neutron Allow Port Update

Removing VCD Variables from the occne.ini File

Performing an Upgrade

Performing Upgrade Across Multiple Maintenance Windows

Upgrading BareMetal CNE Deployed using Bare Minimum Servers

Upgrading CNE Upto Two Releases (N+2)

Postupgrade Tasks

Restoring CNE Customizations

Restoring Prometheus Alert Rules

Restoring Grafana Dashboards

Activating Optional Features

Updating Port Name for servicemonitors and podmonitors

Upgrading Grafana Post Upgrade

Removing VCD Variables from the `occne.ini` File