3 Upgrading CNE
Note:
Upgrading from 23.3.x to 23.4.x causes additional reboots of the CNE nodes as CNE 23.4.0 uses Oracle Linux 9 and CNE 23.3.x runs on Oracle Linux 8.3.1 Supported Upgrade Paths
The following table lists the supported upgrade paths for CNE:
Table 3-1 Supported Upgrade Paths
Source Release | Target Release |
---|---|
23.4.x (where x = 1,4) | 23.4.6 |
23.3.x | 23.4.6 |
Note:
CNE supports performing an OS update on an existing system that runs version 23.4.1 or above on Oracle Linux 9.3.2 Prerequisites
- The user's central repository is updated with the latest versions of RPMs, binaries, and CNE Images for 23.4.x. For more information on how to update RPMs, binaries, and CNE images, see Artifact Acquisition and Hosting.
- All Network Functions (NFs) are upgraded before performing a CNE upgrade. For more information about NF upgrade procedure, see Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
- The CNE instance that is upgraded has at least the minimum recommended node counts for Kubernetes (that is, three master nodes and six worker nodes).
Note:
Currently, CNE doesn't support rollback in any instances such as:- encountering an error after initiating an upgrade
- after a successful upgrade
Caution:
User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content especially when the hyphens or any special characters are part of the copied content.3.3 Common Services Release Information
/var/occne/cluster/${OCCNE_CLUSTER}/artifacts
directory:
- Kubernetes Release File:
K8S_container_images.txt
- Common Services Release File:
CFG_container_images.txt
3.4 Preupgrade Tasks
Before upgrading CNE, perform the tasks described in this section.
3.4.1 Saving CNE Customizations
Before upgrading a CNE instance, you must save all the customizations applied to the CNE instance so that you can reapply them after the upgrade is complete.
3.4.1.1 Preserving Grafana Dashboards
- Log in to Grafana GUI.
- Select the dashboard to save.
- Click Share Dashboard to save the dashboard.
Figure 3-1 Grafana Dashboard
- Navigate to the Export tab and click Save to file to
save the file in the local repository.
Figure 3-2 Saving the Dashboard in Local Repository
- Repeat steps 1 to 4 until you save all the required customer-specific dashboards.
3.4.2 Performing Preupgrade Health Checks
Perform the following steps to ensure that cluster is in healthy state.
Check drive space on Bastion Host
Before upgrading, ensure that there's sufficient drive space at the home
directory of the user (usually admusr
for Bare Metal and
cloud-user
for vCNE), where the upgrade runs.
The df -h command in the /home
directory must
have at least 3.5 GB and /var
directory must have at least 10 GB
free space for temporary gathering and for running the CNE containers when the
upgrade procedure runs.
If there is insufficient space, then free up some space. One common
location to reclaim space is the podman image storage for local images. You can find
the local images using the 'podman image ls
' command, and remove
them by using the 'podman image rm -f [image]
' command. You can
reclaim additional podman space by using the 'podman system prune
-fa
' command to remove any unreferenced image layers.
Check OpenSearch pods disk space
For Elasticsearch pods that are using PVCs, check available disk space and confirm that there is at least 1 GB of disk space available before running the upgrade.
kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-1 -c opensearch -- df -h /usr/share/opensearch/data
kubectl -n occne-infra exec occne-opensearch-cluster-data-2 -c opensearch -- df -h /usr/share/opensearch/data
$ kubectl -n occne-infra exec occne-opensearch-cluster-data-0 -c opensearch -- df -h /usr/share/opensearch/data
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 9.8G 5.3G 4.1G 57% /usr/share/opensearch/data
3.4.3 Checking Preupgrade Config Files
Check manual updates on the pod resources: Check the manual updates made to
the Kubernetes cluster configuration such as deployments and daemonsets, after the
initial deployment, are configured in the proper occne.ini
(vCNE) or
hosts.ini
(Bare Metal) file. For more information, see Preinstallation Tasks.
3.5 Standard Upgrade
This section describes the procedure to perform a full upgrade, OS update, or both on a given CNE deployment (Bare Metal or vCNE).
Note:
- This upgrade is only used to upgrade from release 23.3.x to release 23.4.x.
- Ensure that you complete all the preupgrade procedures before performing the upgarde.
- It is suggested to use a terminal emulator (such as, tmux) when running this procedure so the Bastion Host bash shell continues to run even in case of shell and VPN disconnections.
- It is suggested to use a session capture program (such as, Script) on the Bastion Host to capture all input and output for diagnosing issues. This program must be rerun for each login.
- Initiate the
upgrade.sh
script from the active Bastion Host. However, during most of the upgrades, there is no designated active Bastion Host as the system changes continuously. Therefore, ensure that you run theupgrade.sh
script from the same Bastion Host that was used initially. - The upgrade procedure can take hours to complete and the total time depends on the configuration of the cluster.
- Before performing an upgrade or OS update, verify the health of the cluster and the services related to CNE.
WARNING:
Refrain from performing a controlled abort (ctrl-C) on the upgrade while it is in progress. Allow the upgrade to exit gracefully from an error condition or after a successful completion.Log Files for Debugging Upgrade
The system generates many log files during the upgrade or OS update
process. All the log files are suffixed with a date and timestamp. These files are
maintained in the
/var/occne/cluster/<cluster_name>/upgrade/logs
directory
and can be removed after the upgrade or OS update completes successfully. For any
issues encountered during the upgrade, these files must be collected into a tar file
and made available to the next level of support for debugging.
3.5.1 Performing an Upgrade or OS Update
This section describes the procedure to perform an upgrade, OS upgrade, or both.
Note:
- The upgrade and OS update causes the current Bastion Host to reboot multiple
times. Each time the
upgrade.sh
script terminates without indicating an error condition, rerun theupgrade.sh
script using this procedure on the same Bastion Host after it reboots. - Currently, any procedure that applies to VMware and uses Terraform
doesn't operate successfully and cannot be used. These procedures include the
following:
- Replacing a Failed vCNE LoadBalancer
- Replacing a Failed Kubernetes Worker Node
- Replacing a Failed Kubernetes Controller Node
- OpenStack maintenance procedures that utilize Terraform in any step to create a
new VM will not work properly if the
cluster.tfvars
image field is not updated to "ol9u2
" after the upgrade completes. For more information about downloading Oracle Linux, see Downloading Oracle Linux.
- Use SSH to log in to the active Bastion Host. You can determine the
active Bastion Host by running the following
command:
$ is_active_bastion
Sample output:If you are rerunning the upgrade script after an error or termination, log in to the same Bastion Host that was used during the initial run.IS active-bastion
- Ensure that the naming format of the existing OLX
.repo
file in the/var/occne/yum.repos.d/
directory is<CENTRAL_REPO>-olx.repo
, wherex
is the version number (For example,<CENTRAL_REPO>-ol8.repo
). - Ensure that the new central repository OLX yum
.repo
file is present in the is in/var/occne/yum.repos.d/
directory and the format of the file name is<CENTRAL_REPO>-olx.repo
, where x is the version number (For example,<CENTRAL_REPO>-ol9.repo
):For example:$ curl http://${CENTRAL_REPO}/<path_to_file>/${CENTRAL_REPO}-ol9.repo -o /var/occne/yum.repos.d/${CENTRAL_REPO}-ol9.repo
Note:
Ensure the content of the new central repository OLX yum.repo
file is correct. - Perform one the following steps to initiate or resume the upgrade
or the OS update:
Note:
The upgrade runs an initial cluster test based on the currentOCCNE_VERSION
. If the initial upgrade cluster test fails, theupgrade.sh
script terminates. However, at this point, the upgrade is not started. Therefore, after correcting the issues discovered, you can restart the upgrade using step a. This is not applicable to the usual expected exits in theupgrade.sh
script.- Run the following command to launch the upgrade script to
perform an upgrade to the new
version:
$ OCCNE_NEW_VERSION=<new_version_tag> upgrade.sh
- Run the following command to launch the upgrade script to
perform an OS update and subsequent runs for both upgrade and OS
update:
$ upgrade.sh
- Run the following command to launch the upgrade script to
perform an upgrade to the new
version:
- When the upgrade process initiates a reboot of the hosting Bastion,
the
upgrade.sh
script terminates gracefully with the following output and the current Bastion is rebooted after a short period.Sample output:
The current Bastion: ${OCCNE_HOSTNAME} will be going into reboot after configuration changes. Wait until the reboot completes and follow the documentation to continue the upgrade. This may take a number of minutes, longer on a major OS upgrade.
Once the Bastion recovers from the reboot, rerunupgrade.sh
by running the command in Step 4b.Note:
- Once the upgrade begins on each node (starting with
the active Bastion Host), the login Banner in Shell is updated
to reflect the following message. This is restored back to the
original message when the upgrade completes. This Banner is not
set in the current active LBVMs for vCNE when the Banner is set
in the other nodes. This is because the current active LBVMs are
not upgraded until the end of the
procedure.
**************************************************************************** | | Date/Time: 2024-01-16 14:56:31.612728 | | OCCNE UPGRADE TO VERSION: 23.4.1 IN PROGRESS | | Please discontinue login if not assisting in this maintenance activity. | | ****************************************************************************
- In some cases, you may see an "Ansible FAILED!"
assertion message after the following message. This is an
expected behavior where the system tries to return the control
to Shell when CNE detects that a reboot will interrupt
processing.
TASK [staged_reboot : Halt ansible for os_upgrade reboot on current bastion (or its kvm host). After reboot, reconnect to same bastion, and relaunch upgrade.sh] *** fatal: [my-cluster-name-bastion-1]: FAILED! => { "assertion": false, "changed": false, "evaluated_to": false, "msg": "NOT AN ERROR: This is an EXPECTED assertion to flag self-reboot, and return shell control." } PLAY RECAP ********************************************************************* my-cluster-name-bastion-1 : ok=70 changed=24 unreachable=0 failed=1 skipped=194 rescued=1 ignored=0
- By default, during an upgrade and OS update, the
upgrade.sh
script exits before rebooting the ACTIVE LBVMs by displaying the following message:
You must manually reboot or switch the activity of each ACTIVE LBVM such that it becomes the STANDBY LBVM. For procedure to perform a manual switchover of LBVMs, see the Performing Manual Switchover of LBVMs During Upgrade section. When the switchover of the LBVMs completes successfully, rerun theSkipping active LBVMs reboot since OCCNE_REBOOT_ACTIVE_LB is not set. The active LBVMs must be manually rebooted and the upgrade.sh script be run again.
upgrade.sh
script.
- Once the upgrade begins on each node (starting with
the active Bastion Host), the login Banner in Shell is updated
to reflect the following message. This is restored back to the
original message when the upgrade completes. This Banner is not
set in the current active LBVMs for vCNE when the Banner is set
in the other nodes. This is because the current active LBVMs are
not upgraded until the end of the
procedure.
- When the upgrade or OS update is complete, the system displays the
following message:
Message format for CNE upgrade:
For example:<date/time>******** Upgrade Complete ***********
12/12/2023 - December Tuesday 01:09:57 - *********** Upgrade Complete **************
Message format for OS update:<date/time>******** OS Update Complete ***********
3.6 Postupgrade Tasks
This section describes the postupgrade tasks for CNE.
3.6.1 Restoring CNE Customizations
This section provides information about restoring CNE customizations. Ensure that you restore all the customizations applied to the CNE instance after completing the upgrade process.
3.6.1.1 Restoring Grafana Dashboards
Perform the following steps to restore the Grafana dashboard:
- Load the previously installed Grafana dashboard.
- Click the + icon on the left panel and select Import.
Figure 3-3 Load Grafana Dashboard
- Once in new panel, click Upload JSON file. Choose the locally
saved dashboard file.
Figure 3-4 Uploading the Dashboard
- Repeat the same steps for all the dashboards saved from the older version.
3.6.2 Verifying Terraform Files in VMware Deployments
This section provides details about verifying the content of the
compute/main.tf
and
compute-lbvm/main.tf
files in a VMware deployment
after performing an upgrade.
- Update the Linux image template from OL8 to OL9 by changing the
template_name
variable in the/var/occne/cluster/${OCCNE_CLUSTER}/${OCCNE_CLUSTER}/cluster.tfvars
file:
The following example shows the$ vi /var/occne/cluster/${OCCNE_CLUSTER}/${OCCNE_CLUSTER}/cluster.tfvars
template_name
variable that must be updated:template_name = "<name of the OL9 template>"
- Verify the content of the
compute/main.tf
andcompute-lbvm/main.tf
files:- Run the following command to verify the content of the
compute/main.tf
file:$ cat /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf | grep 'ignore_changes\|override_template_disk' -C 2
Ensure that the content of the file exactly matches the following content:} override_template_disk { bus_type = "paravirtual" size_in_mb = var.disk -- lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] } -- } override_template_disk { bus_type = "paravirtual" size_in_mb = var.disk -- lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] }
- Run the following command to verify the content of the
compute-lbvm/main.tf
file:$ cat /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute-lbvm/main.tf | grep 'ignore_changes\|override_template_disk' -C 2
Ensure that the content of the file exactly matches the following content:} override_template_disk { bus_type = "paravirtual" size_in_mb = var.disk -- lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] } -- } override_template_disk { bus_type = "paravirtual" size_in_mb = var.disk -- lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] }
- Run the following command to verify the content of the
- If the files don't contain the
ignore_changes
argument, then edit the files and add the argument to each of the "vcd_vapp_vm
" resources:- Run the following command to edit the
compute/main.tf
file:$ vi /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf
- Add the following content between each
override_template_disk
code block andmetadata = var.metadata
line for each "vcd_vapp_vm
" resource:lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] }
- Save the
compute/main.tf
file. - Run the following command to edit the
compute-lbvm/main.tf
file:$ vi /var/occne/cluster/${OCCNE_CLUSTER}/modules/compute/main.tf
- Add the following content between each
override_template_disk
code block andmetadata = var.metadata
line for each "vcd_vapp_vm
" resource:lifecycle { ignore_changes = [ vapp_template_id, template_name, catalog_name, override_template_disk ] }
- Save the
compute-lbvm/main.tf
file. - Repeat step 1 to ensure that the content of the files matches the ones provided in the step.
- Run the following command to edit the
3.6.3 Activating Optional Features
This section provides information about activating optional features, such as Velero and Local DNS post upgrade.
3.6.3.1 Activating Velero Post Upgrade
This section provides information about the Velero activation procedure.
Velero is used for performing on-demand backups and restore of CNE cluster data. Velero is an optional feature and has extra set of hardware and networking requirements. You can activate Veloro after a CNE installation or upgrade. For more information about activating Velero, see Activating Velero.
3.6.3.2 Activating Local DNS
This section provides information about activating Local DNS post upgrade.
The Local DNS feature is a reconfiguration of core DNS (CoreDNS) to support external hostname resolution. When Local DNS is enabled, CNE routes the connection to external hosts through core DNS rather than the nameservers on the Bastion Hosts. For information about activating this feature post upgrade, see the "Activating Local DNS" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
To stop DNS forwarding to Bastion DNS, you must define the DNS details through 'A' records and SRV records. A records and SRV records are added to CNE cluster using Local DNS API calls. For more information about adding and deleting DNS records, see the "Adding and Removing DNS Records" section in Oracle Communications Cloud Native Core, Cloud Native Environment User Guide.
3.6.4 Updating Port Name for servicemonitors and podmonitors
The metric port name on which Prometheus extracts metrics from 5G-CNC applications
must be updated to "cnc-metrics
".
- Run the following command to get the servicemonitor
details:
$ kubectl get servicemonitor -n occne-infra
Sample output:NAME AGE occne-nf-cnc-servicemonitor 60m
- Run the following command to update the port name for
servicemonitor:
$ kubectl edit servicemonitor occne-nf-cnc-servicemonitor -n occne-infra # Edit the above servicemonitor and update the following port name by removing "http" prefix. existing port name - port: http-cnc-metrics updated port name - port: cnc-metrics
- Save the changes for servicemonitor.
- Run the following command to get the podmonitor
details:
$ kubectl get podmonitor -n occne-infra
Sample output:NAME AGE occne-nf-cnc-podmonitor 60m
- Run the following command to update the port name for
podmonitor:
$ kubectl edit podmonitor occne-nf-cnc-podmonitor -n occne-infra existing port name - port: http-cnc-metrics updated port name - port: cnc-metrics
- Save the changes for podmonitor.