7 Fault Recovery
This chapter describes the procedures to perform fault recovery for Oracle Communications Cloud Native Core, Network Slice Selection Function (NSSF) deployment.
7.1 Overview
Note:
This document describes recovery procedures to restore NSSF completely or partially.7.2 Impacted Areas
The following table provides information about the impacted areas during NSSF fault recovery:
Table 7-1 Impacted Areas
Scenario | Requires Fault Recovery or Re-install of CNE | Requires Fault Recovery or Re-install of cnDBTier | Requires Fault Recovery or Re-install of NSSF |
---|---|---|---|
Scenario 1: Database Migration to a New Cluster | Yes, if cluster is not present. | Yes, if cnDBTier is not present. | Yes |
Scenario 2: Deployment Failure | No | No | Yes |
Scenario 3: cnDBTier Corruption | No | Yes. Restoring cnDBTier from older backup is the only way to restore back to restore point. | Yes, only if cnDBTier credentials are changed. |
Scenario 4: Site Failure | Yes | Yes | Yes |
Note:
All sites require fault recovery.7.3 Prerequisites
- cnDBTier must be in a healthy state and available on multiple sites
along with NSSF. To check the cnDBTier status, perform the following steps:
- Run the following command to ensure that all the nodes are
connected:
ndb_mgm> show
- Run the following command to check the pod
status:
kubectl get pods -n <namespace>
If the pod status is
Running
, then cnDBTier is in healthy state. - Run the following command to check if the replication is
up:
mysql> show slave status\G
In case there is any error, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Run the following command to check which cnDBTier is having
ACTIVE replication to take
backup:
select * from replication_info.DBTIER_REPLICATION_CHANNEL_INFO;
- Run the following command to ensure that all the nodes are
connected:
- Automatic backup must be enabled on cnDBTier. Enabling automatic
backup helps in:
- restoring stable version of the NSSF database
- minimizing significant loss of data due to upgrades or roll back failures
- minimizing loss of data due to system failure
- minimizing loss of data due to data corruption or deletion due to external input
- migrating database information from one site to another
- The following files must be available for fault recovery:
- Custom values file (
ocnssf_custom_values_24.1.1.yaml
) used at the time of network function deployment - Helm charts (
ocnssf_custom_values_24.1.1.yaml
) used at the time of network function deployment. - Secrets and Certificates
- RBAC resources
- Custom values file (
7.4 Fault Recovery Scenarios
This section describes the fault recovery procedures for various scenarios.
7.4.1 Database Migration to a New Cluster
This section describes how to migrate NSSF database from an existing cluster to a new cluster. This scenario is applicable in both cases when you want to migrate only configuration, or configuration as well as state data to a new cluster.
To migrate the database to a new cluster:
- Shutdown an older site. For information about shutting down a site, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Optional step: Take data backup from an older site, restore it to a new site, and perform cnDBTier fault recovery procedure. For more information about cnDBTier backup, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Install NSSF using the Helm charts. For more information about installing NSSF, see
Oracle Communications Cloud Native Core, Network Slice Selection
Function
Installation, Upgrade, and Fault Recovery Guide.
Note:
You can also refer to theocnssf_custom_values_24.1.1.yaml
file used at the time of NSSF installation for Helm charts installation.
7.4.2 Deployment Failure
This section describes how to recover NSSF when its deployment corrupts.
- Uninstall NSSF using the Helm release name. For information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
- Install NSSF using the Helm charts. For more information about installing NSSF, see
Oracle Communications Cloud Native Core, Network Slice Selection
Function
Installation, Upgrade, and Fault Recovery Guide.
Note:
You can also refer to theocnssf_custom_values_24.1.1.yaml
file used at the time of NSSF installation.
7.4.3 cnDBTier Corruption
This section describes how to recover cnDBTier from the corrupted database.
When the configuration data corrupts, the database on all other sites may also corrupt due to data replication. It depends on the replication status after the corruption has occurred. If data replication stops due to database corruption, then cnDBTier fails in either single or multiple sites (but not all sites). If the data replication is successful, database corruption replicates to all the cnDBTier sites, and cnDBTier fails in all sites.
For more information, see the following sub scenarios:
7.4.3.1 When cnDBTier Failed in Single or Multiple Sites
This section describes how to recover configuration data when the data replication stops due to database corruption and cnDBTier has failed in either single or multiple sites (but not all sites).
To recover configuration data:
- Uninstall NSSF using the Helm release name. For information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
- Perform cnDBTier fault recovery procedure:
- Create an on-demand backup from the mated site that has a healthy replication with failed site(s). For more information about cnDBTier backup, see Oracle Communications Cloud Native Core, Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Use the backup data from mated site for restore. For more information about cnDBTier restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Install NSSF using the helm charts. For more information about installing NSSF, see
Oracle Communications Cloud Native Core, Network Slice Selection
Function Installation, Upgrade, and Fault Recovery Guide.
Note:
You can also refer to theocnssf_custom_values_24.1.1.yaml
file used at the time of NSSF installation for Helm charts installation.
7.4.3.2 When cnDBTier Failed in All Sites
This section describes how to recover configuration data when successful data replication corrupts all the cnDBTier sites.
To recover configuration data:
- Uninstall NSSF using the Helm release name. For more information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide available on My Oracle Support.
- Perform cnDBTier fault recovery procedure:
- Perform the restore procedure using the auto-data backup file. For more information,
see Oracle Communications Cloud Native Core,
cnDBTier Installation, Upgrade, and Fault Recovery Guide.
Note:
The auto-data backup file is built from scheduled automatic backup. cnDBTier Backup Manager Service ensures auto-data backup as per the predefined configuration. For more information about cnDBTier Backup Manager Service, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
- Perform the restore procedure using the auto-data backup file. For more information,
see Oracle Communications Cloud Native Core,
cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Install NSSF using the Helm charts. For more information about installing NSSF, see
Oracle Communications Cloud Native Core, Network Slice Selection
Function
Installation, Upgrade, and Fault Recovery Guide.
Note:
You can also refer to theocnssf_custom_values_24.1.1.yaml
file used at the time of NSSF installation.
7.4.4 Site Failure
This section describes how to perform fault recovery when either one, many, or all of the sites have software failure.
7.4.4.1 Single or Multiple Site Failure
This section describes how to recover a site when you have cnDBTier and NSSF installed on multiple sites with automatic data replication and backup enabled. Also, one or more sites ( but not all of them) have failed, and there is a requirement to perform fault recovery.
To recover the failed sites:
- Install Oracle Communications Cloud Native Environment (fault) on the new cluster. For more information about installing fault, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
- For cnDBTier fault recovery:
- Take an on-demand backup from the mated site that has a healthy replication with the failed site(s). For more information about on-demand backup, see Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Use the backup data from the mate site for database restore. For more information about database restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
7.4.4.2 Complete Site Outage
This section describes how to recover a site when you have cnDBTier and NSSF installed on multiple sites with automatic data replication and backup enabled. Also, it has been observed that all the sites have failed and there is a requirement to perform fault recovery.
To recover all the failed sites:
- Install Oracle Communications Cloud Native Environment (fault) on the new cluster. For more information about installing fault, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
- Use auto-data backup file for database restore. For more information about database
restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
Note:
The auto-data backup file is built from scheduled automatic backup. cnDBTier Backup Manager Service ensures auto-data backup as per the predefined configuration. For more information about cnDBTier Backup Manager Service, see Oracle Communications Cloud Native Core, cnDBTier User Guide. - Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.