7 Fault Recovery

This chapter describes the procedures to perform fault recovery for Oracle Communications Cloud Native Core, Network Slice Selection Function (NSSF) deployment.

7.1 Overview

You must take database backup and restore it either on the same or a different cluster. It uses the NSSF database to run any command or follow instructions.

Note:

This document describes recovery procedures to restore NSSF completely or partially.

7.2 Impacted Areas

The following table provides information about the impacted areas during NSSF fault recovery:

Table 7-1 Impacted Areas

Scenario Requires Fault Recovery or Re-install of CNE Requires Fault Recovery or Re-install of cnDBTier Requires Fault Recovery or Re-install of NSSF
Scenario 1: Database Migration to a New Cluster Yes, if cluster is not present. Yes, if cnDBTier is not present. Yes
Scenario 2: Deployment Failure No No Yes
Scenario 3: cnDBTier Corruption No Yes. Restoring cnDBTier from older backup is the only way to restore back to restore point. Yes, only if cnDBTier credentials are changed.
Scenario 4: Site Failure Yes Yes Yes

Note:

All sites require fault recovery.

7.3 Prerequisites

Before performing any fault recovery procedure, ensure that the following prerequisites are met:
  1. cnDBTier must be in a healthy state and available on multiple sites along with NSSF. To check the cnDBTier status, perform the following steps:
    1. Run the following command to ensure that all the nodes are connected:
      ndb_mgm> show
    2. Run the following command to check the pod status:
      kubectl get pods -n <namespace>

      If the pod status is Running, then cnDBTier is in healthy state.

    3. Run the following command to check if the replication is up:
      mysql> show slave status\G

      In case there is any error, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

    4. Run the following command to check which cnDBTier is having ACTIVE replication to take backup:
      select * from replication_info.DBTIER_REPLICATION_CHANNEL_INFO;
  2. Automatic backup must be enabled on cnDBTier. Enabling automatic backup helps in:
    • restoring stable version of the NSSF database
    • minimizing significant loss of data due to upgrades or roll back failures
    • minimizing loss of data due to system failure
    • minimizing loss of data due to data corruption or deletion due to external input
    • migrating database information from one site to another
  3. The following files must be available for fault recovery:
    • Custom values file (ocnssf_custom_values_23.4.0.yaml) used at the time of network function deployment
    • Helm charts (ocnssf_custom_values_23.4.0.yaml) used at the time of network function deployment.
    • Secrets and Certificates
    • RBAC resources

7.4 Fault Recovery Scenarios

This section describes the fault recovery procedures for various scenarios.

7.4.1 Database Migration to a New Cluster

This section describes how to migrate NSSF database from an existing cluster to a new cluster. This scenario is applicable in both cases when you want to migrate only configuration, or configuration as well as state data to a new cluster.

To migrate the database to a new cluster:

  1. Shutdown an older site. For information about shutting down a site, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  2. Optional step: Take data backup from an older site, restore it to a new site, and perform cnDBTier fault recovery procedure. For more information about cnDBTier backup, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  3. Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

    Note:

    You can also refer to the ocnssf_custom_values_23.4.0.yaml file used at the time of NSSF installation for Helm charts installation.

7.4.2 Deployment Failure

This section describes how to recover NSSF when its deployment corrupts.

  1. Uninstall NSSF using the Helm release name. For information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  2. Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

    Note:

    You can also refer to the ocnssf_custom_values_23.4.0.yaml file used at the time of NSSF installation.

7.4.3 cnDBTier Corruption

This section describes how to recover cnDBTier from the corrupted database.

When the configuration data corrupts, the database on all other sites may also corrupt due to data replication. It depends on the replication status after the corruption has occurred. If data replication stops due to database corruption, then cnDBTier fails in either single or multiple sites (but not all sites). If the data replication is successful, database corruption replicates to all the cnDBTier sites, and cnDBTier fails in all sites.

For more information, see the following sub scenarios:

7.4.3.1 When cnDBTier Failed in Single or Multiple Sites

This section describes how to recover configuration data when the data replication stops due to database corruption and cnDBTier has failed in either single or multiple sites (but not all sites).

To recover configuration data:

  1. Uninstall NSSF using the Helm release name. For information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  2. Perform cnDBTier fault recovery procedure:
    1. Create an on-demand backup from the mated site that has a healthy replication with failed site(s). For more information about cnDBTier backup, see Oracle Communications Cloud Native Core, Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. Use the backup data from mated site for restore. For more information about cnDBTier restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  3. Install NSSF using the helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

    Note:

    You can also refer to the ocnssf_custom_values_23.4.0.yaml file used at the time of NSSF installation for Helm charts installation.
7.4.3.2 When cnDBTier Failed in All Sites

This section describes how to recover configuration data when successful data replication corrupts all the cnDBTier sites.

To recover configuration data:

  1. Uninstall NSSF using the Helm release name. For more information about uninstalling NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide available on My Oracle Support.
  2. Perform cnDBTier fault recovery procedure:
    1. Perform the restore procedure using the auto-data backup file. For more information, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

      Note:

      The auto-data backup file is built from scheduled automatic backup. cnDBTier Backup Manager Service ensures auto-data backup as per the predefined configuration. For more information about cnDBTier Backup Manager Service, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  3. Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

    Note:

    You can also refer to the ocnssf_custom_values_23.4.0.yaml file used at the time of NSSF installation.

7.4.4 Site Failure

This section describes how to perform fault recovery when either one, many, or all of the sites have software failure.

7.4.4.1 Single or Multiple Site Failure

This section describes how to recover a site when you have cnDBTier and NSSF installed on multiple sites with automatic data replication and backup enabled. Also, one or more sites ( but not all of them) have failed, and there is a requirement to perform fault recovery.

To recover the failed sites:

  1. Install Oracle Communications Cloud Native Environment (fault) on the new cluster. For more information about installing fault, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  2. For cnDBTier fault recovery:
    1. Take an on-demand backup from the mated site that has a healthy replication with the failed site(s). For more information about on-demand backup, see Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. Use the backup data from the mate site for database restore. For more information about database restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  3. Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
7.4.4.2 Complete Site Outage

This section describes how to recover a site when you have cnDBTier and NSSF installed on multiple sites with automatic data replication and backup enabled. Also, it has been observed that all the sites have failed and there is a requirement to perform fault recovery.

To recover all the failed sites:

  1. Install Oracle Communications Cloud Native Environment (fault) on the new cluster. For more information about installing fault, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  2. Use auto-data backup file for database restore. For more information about database restore, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

    Note:

    The auto-data backup file is built from scheduled automatic backup. cnDBTier Backup Manager Service ensures auto-data backup as per the predefined configuration. For more information about cnDBTier Backup Manager Service, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  3. Install NSSF using the Helm charts. For more information about installing NSSF, see Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.