7.4 Fault Recovery Scenarios

This section describes the fault recovery procedures for various scenarios.

7.4.1 Scenario 1: Deployment Failure

This scenario describes how to recover SEPP when the deployment corrupts.

To recover SEPP:

  1. Run the following command to uninstall SEPP:
    helm uninstall <release_name> --namespace <namespace>
     
    Example:
    helm uninstall ocsepp --namespace seppsvc
    For more information about uninstalling SEPP, see the Uninstalling SEPP section.
  2. Install SEPP as described in the Installing SEPP section. Use the backup of custom values file to reinstall SEPP.

Restore SEPP, cnDBTier, and SEPP database (DB) as described in Restoring SEPP and cnDBTier.

7.4.2 Scenario 2: cnDBTier Corruption

This section describes how to recover database when the data replication is broken due to database corruption and cnDBTier has failed in single, multiple sites or all sites.

When the database corrupts, the database on all the other sites may also corrupt due to data replication. It depends on the replication status after the corruption has occurred. If the data replication is broken due to database corruption, then cnDBTier fails in either single or multiple sites (not all sites). And if the data replication is successful, then database corruption replicates to all the cnDBTier sites and cnDBTier fails in all sites.

The following are cnDBTier failure scenarios:

If corrupted database is replicated to mated sites, follow:

If corrupted database cause replication failure and hence local to a site, follow:

  • When DBTier failed in all Sites

    Note:

    This scenario impacts all the NFs using the corrupted cnDBTier. All the NFs sharing cnDBTier needs to do a fault recovery as cnDBTier is corrupted.

7.4.2.1 When cnDBTier failed in Single or Multiple (but not all) Sites

This section describes how to recover database when the data replication is broken due to database corruption and cnDBTier has failed in either single or multiple sites (not all sites).

To recover database:

  1. Uninstall SEPP Helm chart. For information about uninstalling SEPP, see the Uninstalling SEPP section.
  2. For cnDBTier fault recovery:
    1. Create on-demand backup from mated site that has health replication with failed site. For more information about cnDBTier backup, see the "Create On-demand Database Backup" chapter in the Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. Use the backup data from mate site for restore. For more information about cnDBTier restore, see the "Restore Georeplication Failure" chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

      Note:

      The "Restore Georeplication Failure" chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide has a procedure for two sites where one of the cluster has fatal error. You can perform that procedure for all the sites in a multiple site setup.
  3. Install SEPP Helm chart. For more information about installing SEPP, see the Installing SEPP section.

7.4.2.2 When cnDBTier failed in all Sites

This section describes how to recover database when successful data replication corrupts all the cnDBTier sites.

To recover database:

  1. Uninstall SEPP helm charts. For more information about uninstalling SEPP, see the Uninstalling SEPP section.
  2. For cnDBTier fault recovery:
    1. Use on-demand backup file to restore database from the previous data backup. For more information about cnDBTier restore, see the Restore Georeplication Failure chapter in the Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

      Note:

      The Restore Georeplication Failure chapter has a procedure for two sites where one of the cluster has fatal error. You can perform that procedure for all the sites in a multiple site setup.
  3. Install SEPP helm charts. For more information about installing SEPP, see the see the Installing SEPP section.

7.4.3 Scenario 3: Configuration Database Corruption

This scenario describes how to recover SEPP when its configuration database corrupts.

The configuration database is stored in a site exclusive database along with its tables. Thus, corruption of configuration database impacts only a particular site.

To recover SEPP configuration database, user has to restore the datebase (DB) backup.
  1. Transfer the <backup_ filename >.sql.gz file to the SQL node where user wants to restore it.
  2. Log in to MySQL NDB Cluster's SQL node on the new DB cluster and create a new database where the database needs to be restored.
  3. For details on creating database and user and adding permissions, see Configuring Database, Creating Users, and Granting Permissions section.

    Note:

    The database name created in the above step should be same as the database name created in the following step. The Kubernetes secret must be the same as in the custom_values.yaml used later for Installing SEPP.
  4. Use the following command to restore the database to the new database:
    gunzip < <backup_filename>.sql.gz | mysql -h127.0.0.1 –u <username> -p <backup-database-name>

    Enter the password when prompted.

    Example:

    gunzip < SEPPdbBackup.sql.gz | mysql -h127.0.0.1 -u dbuser -p seppdb

7.4.4 Scenario 4: Site Failure

This section describes how to perform fault recovery when either one, multiple, or all sites have a software failure. The following are site failure scenarios:

7.4.4.1 Single or Multiple Site Failure

This scenario applies when one or more sites, and not all sites, have failed and there is a requirement to perform fault recovery. It is assumed that the user has cnDBTier and SEPP installed on multiple sites with automatic data replication and backup enabled.

To recover the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation Guide.
  2. For cnDBTier fault recovery:
    1. Take on-demand backup from the mate site that has health replication with the failed site or sites. For more information about on-demand backup, see the "Create On-demand Database Backup" chapter in the Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. Use the backup data from the mate site to restore the database. For more information about database restore, see "Restore Georeplication Failure" chapter in the Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  3. Install SEPP Helm chart. For more information about installing SEPP, see the Uninstalling SEPP section.

7.4.4.2 All Sites Failure

This scenario applies when all sites have failed, and there is a requirement to perform fault recovery. It is assumed that the user has cnDBTier and SEPP installed on multiple sites with automatic data replication and backup enabled.

To recover all the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  2. Use on-demand backup file to restore database from previous data backup. For more information about database restore, see "Restore Georeplication Failure" chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

    Note:

    • The auto-data backup file is one that is built from scheduled automatic backup.
    • The "Restore Georeplication Failure" chapter contains a procedure for two sites where one of the clusters has fatal error. You can perform the same procedure for all the sites in a multiple site setup.
  3. Install SEPP helm chart.