8 Fault Recovery

This chapter provides information about fault recovery for Cloud Native Configuration Console deployment.

8.1 Overview

You must take database backup and restore it either on the same or a different cluster. It uses the CNCC database to run any command or follow instructions.

Note:

This chapter describes fault recovery scenarios of CNC Console and how to recover from those scenarios.

Note:

The Fault Recovery procedure exclusively applies to CNC Console and does not encompass any aspects related to OCI Resources.

8.2 Impacted Area

The following table describes scenarios about the impacted areas during CNC Console fault recovery:

Table 8-1 Fault Recovery Scenarios Impact Information

Scenario Requires Fault Recovery or re-install of CNE? Requires Fault Recovery or re-install of cnDBTier? Requires Fault Recovery or re-install of CNC Console? Comments
Scenario 1: Complete Site Failure Yes Yes Yes  
Scenario 2: cnDBTier Corruption No Yes No DBTier must be restored from backup. Re-install of CnDBTier can be considered, if restoring from backup is not possible. Use helm upgrade if DB configuration is changed
Scenario 3: Console Configuration Database Corruption No No No Backup and restore of configuration database is required on the impacted site. This needs periodic backup. (Applicable for M-CNCC IAM).

Note: Not applicable for OCI deployment

Scenario 4: Deployment Failure No No Yes CNC Console DB is not restored. Only helm uninstall/install is done. (Applicable for M-CNCC IAM, M-CNCC Core and A-CNCC Core).

Note:For OCI deployment, only M-CNCC Core and A-CNCC Core are applicable.

Scenario 5: NF Instance Failure No No Yes The NF endpoints must be updated in the custom values.yaml file and helm upgrade must be performed to incorporate the latest endpoints (Applicable at M-CNCC Core and A-CNCC Core).

8.3 Prerequisites

Before you run any fault recovery procedure, ensure that the following prerequisites are met:

  • DBTier should be in healthy state and available on multiple sites along with CNC Console.
  • On demand backup should be enabled on DBTier. Scheduled regular backups help to:
    • Restore stable version of the CNC Console configuration database.
    • Minimize significant loss of data due to upgrades or roll back failures.
    • Minimize loss of data due to system failure.
    • Minimize loss of data due to data corruption or deletion due to external input.
    • Migrate Console database information from one site to another.
  • Custom values file used at the time of Console deployment is retained. If the custom_values.yaml file is not retained, then regenerate it manually. This task increases the overall fault recovery time.
  • Docker images used during the last installation or upgrade must be retained in the external data source.

    For the CNC Console database backup and restore prerequisites, see Fault Recovery Procedures - DB Backup and Restore section.

8.4 Fault Recovery Scenarios

This section describes the fault recovery procedures for various scenarios.

8.4.1 Scenario 1: Complete Site Failure

This section describes how to perform fault recovery when either one, many, or all of the sites have software failure.

The following are site failure scenarios:

8.4.1.1 Scenario 1A: Single or Multiple Site Failure

This scenario applies when one or more sites, and not all sites, have failed and there is a requirement to perform fault recovery. It is assumed that the user has DBTier and Oracle Communications CNC Console installed on multiple sites with automatic data replication and backup enabled.

To recover the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Environment (OCCNE) Installation Guide.
  2. For CnDBTier fault recovery:
    1. Take on-demand backup from the mate site that has health replication with the failed site or sites.
    2. Use the backup data from the mate site to restore the database.
  3. Install CNC Console helm chart. For more information about installing CNC Console, see the Installing CNC Console chapter in the Oracle Communications Cloud Native Configuration ConsoleInstallation and Upgrade Guide.
8.4.1.2 Scenario 1B: All Sites Failure

This scenario applies when all sites have failed and there is a requirement to perform fault recovery. It is assumed that the user has DBTier and Oracle Communications CNC Console installed on multiple sites with automatic data replication and backup enabled.

To recover all the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Environment (OCCNE) Installation Guide.
  2. Use an on-demand backup file to restore the database from previous data backup.

    Note:

    • The auto-data backup file is one that is built from scheduled automatic backup.
    • The Restore Georeplication Failure chapter contains a procedure for two sites where one of the clusters has a fatal error. You can perform the same procedure for all the sites in a multiple site setup.
  3. Install CNC Console helm chart. For more information about installing CNC Console, see the Installing CNC Console chapter in the Oracle Communications Cloud Native Configuration Console Installation and Upgrade Guide.

8.4.2 Scenario 2: cnDBTier Corruption

This section describes how to recover a database when the data replication has failed due to database corruption and cnDBTier has failed in single, multiple sites, or all sites.

When the database gets corrupted, the database on all the other sites can also get corrupted due to data replication. It depends on the replication status after the corruption has occurred. If the data replication fails due to database corruption, then DBTier fails in either single or multiple sites (not all sites). If the data replication is successful, then database corruption replicates to all the cnDBTier sites and cnDBTier fails in all sites.

The following are cnDBTier failure scenarios:

If corrupted database is replicated to mated sites, follow:

If corrupted database causes replication failure and hence local to a site, follow:

8.4.2.1 Scenario 2A: When DBTier fails in Single or Multiple (but not all) Sites

This section describes how to recover database when the data replication has failed due to database corruption and DBTier has failed in either single or multiple sites (not all sites).

To recover database:

  1. Uninstall CNC Console helm chart. For information about uninstalling CNC Console, see the Uninstalling CNC Console chapter in the Oracle Communications Cloud Native Configuration ConsoleInstallation and Upgrade Guide.
  2. For DBTier fault recovery:
    1. Create on-demand backup from mated site that has health replication with failed site.
    2. Use the backup data from mate site for restoration.
  3. Install CNC Console helm chart. For more information about installing CNC Console, see the Installing CNC Console chapter in the Oracle Communication Cloud Native Configuration ConsoleInstallation and Upgrade Guide.
8.4.2.2 Scenario 2B: When CnDBTier failed in all Sites

This section describes how to recover database when successful data replication corrupts all the DBTier sites.

To recover database:

  1. Uninstall CNC Console helm charts. For more information about uninstalling CNC Console, see the Uninstalling CNC Console chapter in the Oracle Communications Cloud Native Configuration ConsoleInstallation and Upgrade Guide.
  2. For CnDBTier fault recovery:
    1. Use an on-demand backup file to restore the database from the previous data backup.

      Note:

      The Restore Georeplication Failure chapter has a procedure for two sites where one of the cluster has fatal error. You can perform that procedure for all the sites in a multiple site setup.
  3. Install CNC Console helm charts. For more information about installing CNC Console, see the Installing CNC Console chapter in the Oracle Communication Cloud Native Configuration Console Installation and Upgrade Guide.

8.4.3 Scenario 3: Console Configuration Database Corruption

Note:

Not applicable for OCI deployment

This section describes how to recover when the CNC Console configuration database is corrupted.

For recovery and restore procedure, see the Fault Recovery Procedures - DB Backup and Restore section.

8.4.4 Scenario 4: Deployment Failure

Note:

For OCI deployment, only M-CNCC Core and A-CNCC Core are applicable.

This section describes how to recover CNC Console when its deployment fails.

For recovery and restore procedure, see Restoring CNC Console.

8.4.5 Scenario 5: NF Instance Failure

Perform this procedure to recover from NF instance failure.

  1. Refer to the procedures in the specific NF Disaster Recovery Guide to find the necessary action required to be taken.
  2. Check whether the NF endpoints are the same.
    1. If the NF endpoints are the same, no change is required in CNC Console side.
    2. If the NF endpoints are different, update the NF IP and Port in the occncc_custom_values_<version>.yaml file and perform a helm upgrade operation to incorporate the new NF URL.