7 Fault Recovery

This chapter describes the procedures to perform fault recovery for Oracle Communications Cloud Native Core, Unified Data Repository (UDR) deployment.

7.1 Overview

The UDR operators can take database backup and restore them either on the same or a different cluster. The UDR database (MySQL NDB Cluster) is used for running any command or to follow any instruction.

Note:

This guide describes recovery procedures to restore UDR database only. To restore all the databases that are part of cnDBTier, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide. available on My Oracle Support (MOS).

Database Model of UDR

UDR database consists of configuration data and subscribers data. The details are as follows:
  • Configuration Data: The configuration data is exclusive for a given site. Thus, an exclusive logical database is created and used by a site to store its configuration data. Using CNC Console and Configuration Management service, an operator can configure data in the respective site only.
  • Subscribers Data: The subscribers data is shared across sites. Thus, a common logical database is created and used by all sites. The data is replicated across sites to preserve and share session with mated sites. In case of cross-sites messaging or a site failure, shared session data helps in continuity of service.
The following image shows UDR database model in three different sites.

Figure 7-1 Database Model

Database Model

7.2 Impacted Areas

The following table provides information about the impacted areas during UDR fault recovery:

Table 7-1 Impacted Areas

Scenario Requires Fault Recovery or re-install of CNE? Requires Fault Recovery or Re-install of cnDBTier? Requires Fault Recovery or re-install of UDR? Other
Scenario 1: Database Migration to a New Cluster Yes, if cluster is not present Yes, if cnDBTier is not present Yes  
Scenario 2: Deployment Failure No No Yes  
Scenario 3: Database Corruption        
Scenario 3A: Configuration Database Corruption No No No Configuration database is present in the site exclusive database.

UDR backup and restore of configuration database is required on impacted site.

Scenario 3B: Subscriber Database Corruption No Yes

Restore DBTier from older backup is the only way to restore back to restore point.

No

Only if cnDBTier credentials are changed.

All sites require Fault Recovery.
Scenario 4: Site Failure Yes Yes Yes  
Manual Database Backup and Restore No No Yes  

7.3 Prerequisites

Before you run any Fault recovery procedure, ensure that the following prerequisites are met:

  • DBTier should be in healthy state and available on multiple sites along with UDR
  • Automatic backup should be enabled on DBTier. Scheduled regular backups help to:
    • Restore stable version of the UDR database
    • Minimize significant loss of data due to upgrades or roll back failures
    • Minimize loss of data due to system failure
    • Minimize loss of data due to data corruption or deletion due to external input
    • Migrate UDR database information from one site to another
  • Custom values file used at the time of UDR deployment is retained

7.4 Fault Recovery Scenarios

This section describes the fault recovery procedures for various scenarios.

7.4.1 Scenario 1: Database Migration to a New Cluster

This section describes how to migrate UDR from an existing cluster to a new cluster. Some of the reasons for migrating UDR are as follows:
  • Database failure
  • Moving UDR to a bigger cnDBTier
  • Moving UDR to a new (freshly installed) cnDBTier where upgrade is not supported
This scenario includes the following sub scenarios.
7.4.1.1 Scenario 1A: Configuration Database Migration

A scenario where you want to migrate only the configuration database to a new cluster.

To migrate configuration database:

  1. Export configuration data from an older site to a file. For information about exporting data, see Oracle Communications Cloud Native Core, Unified Data Repository Users Guide.
  2. (Optional) Install Kubernetes cluster on a new site. For information about installing Kubernetes cluster, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  3. (Optional) Install cnDBTier on new site. For information about installing cnDBTier, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  4. Install UDR. For more information about UDR installation, see Installing UDR Package.
  5. Import the configuration data from a file to which you have exported configuration data in Step1. For more information about importing data, see Oracle Communications Cloud Native Core, Unified Data Repository User Guide.

Note:

Common services does not provide export and import of their configuration data. This requires rerun of its configuration APIs on a new site to reconfigure common services.
7.4.1.2 Scenario 1B: Configuration and Subscriber Database Migration

A scenario where you want to migrate the configuration as well as subscriber database to a new cluster.

To migrate the configuration and subscriber database to a new cluster:

  1. Shutdown an older site. For information about shutting down a site, see Uninstalling UDR.
  2. (Optional) Install Kubernetes cluster on a new site. For information about installing Kubernetes cluster, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  3. (Optional) To take data backup from an older site and restore it to a new site, perform DBTier Fault recovery procedure. For more information about DBTier backup, see the Create On-demand Database Backup chapter and to restore database to a new site, see the Restore DB with Backup chapter in the Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
  4. Install UDR. For more information, see Installing UDR Package.

Note:

This procedure migrates both, the configuration as well as subscriber data to a new cluster.

7.4.2 Scenario 2: Deployment Failure

This scenario describes how to recover UDR when its deployment corrupts.

To recover UDR:

  1. Uninstall UDR. For more information, see Uninstalling UDR.
  2. Install UDR. For more information, see Installing UDR Package.

7.4.3 Scenario 3: Database Corruption

This scenario describes how to recover UDR when its database (configuration database and subscriber database) corrupts.

7.4.3.1 Scenario 3A: Configuration Database Corruption

This section describes how to recover UDR configuration database from the corrupted database. The configuration database is stored in a site exclusive database along with its tables. Thus, corruption of configuration database impacts only a particular site.

To recover UDR configuration database:
  1. Export the configuration data using either Cloud Native Core Console or RESTAPI. For more information about exporting configuration database, see Oracle Communications Cloud Native Core, Unified Data Repository Users Guide.
  2. Import the configuration data on new site. For more information about importing data, see Oracle Communications Cloud Native Core, Unified Data Repository User Guide.

Note:

Common services does not provide export and import of their configuration data. This requires rerun of its configuration APIs on a new site to reconfigure common components.
7.4.3.2 Scenario 3B: Subscriber Database Corruption

This section describes how to recover UDR when its subscriber database corrupts.

When the subscriber database corrupts, the database on all other sites may also corrupt due to data replication. It depends on the replication status after the corruption has occurred. If the data replication is broken due to database corruption, then DBTier fails in either single or multiple sites (not all sites). And if the data replication is successful, then database corruption replicates to all the DBTier sites and DBTier fails in all sites.

The fault recovery procedure covers following sub-scenarios:

7.4.3.2.1 When cnDBTier Failed in Single or Multiple Sites

This section describes how to recover subscriber database when the data replication is broken due to database corruption and DBTier has failed in either single or multiple sites (not all sites).

To recover subscriber database:

  1. Uninstall UDR. For information, see Uninstalling UDR.
  2. For cnDBTier fault recovery:
    1. To recover a single node failure in cnDBTier, follow the Restoring Single Node Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. If a stand alone cnDBTier is down, follow the Restoring Database From Backup chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the cluster.
    3. If there is georeplication failure between multiple cnDBTier clusters in replication, follow the Restoring Georeplication Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the data.

    Note:

    The Restore Georeplication Failure chapter has a procedure for two sites where one of the cluster has fatal error. You can perform that procedure for all the sites in a multiple site setup.
  3. Run the below curl command to check whether the site replication is UP or not, once the fault recovery procedure completed:
    (curl http://mysql-cluster-db-monitor-svc.occne-cndbtier:8080/db-tier/status/replication/realtime)
  4. Run the below curl command on both the sites to check grstatus is COMPLETED or not:
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster1/status
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster2/status
  5. Install UDR. For more information, see Installing UDR Package.
7.4.3.2.2 When cnDBTier Failed in All Sites

This section describes how to recover subscriber database when successful data replication corrupts all the DBTier sites.

To recover subscriber database:

  1. Uninstall UDR on all sites. For more information. see Uninstalling UDR.
  2. For cnDBTier fault recovery:
    1. To recover a single node failure in cnDBTier, follow the Restoring Single Node Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. If a stand alone cnDBTier is down, follow the Restoring Database From Backup chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the cluster.
    3. If there is georeplication failure between multiple cnDBTier clusters in replication, follow the Restoring Georeplication Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the data.

    Note:

    The Restore Georeplication Failure chapter has a procedure for two sites where one of the cluster has fatal error. You can perform that procedure for all the sites in a multi-site setup.
  3. Run the below curl command to check whether the site replication is UP or not, once the fault recovery procedure completed:
    (curl http://mysql-cluster-db-monitor-svc.occne-cndbtier:8080/db-tier/status/replication/realtime)
  4. Run the below curl command on both the sites to check grstatus is COMPLETED or not:
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster1/status
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster2/status
  5. Install UDR. For more information, see Installing UDR Package.

7.4.4 Scenario 4: Site Failure

This section describes how to perform fault recovery when either one, multiple, or all sites have a software failure. The following are site failure scenarios:

7.4.4.1 Scenario 4A: Single or Multiple Site Failure

This scenario applies when one or more sites, and not all sites, have failed and there is a requirement to perform fault recovery. It is assumed that the user has cnDBTier and Oracle Communications UDR installed on multiple sites with automatic data replication and backup enabled.

To recover the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  2. For cnDBTier fault recovery:
    1. To recover a single node failure in cnDBTier, follow the Restoring Single Node Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. If a stand alone cnDBTier is down, follow the Restoring Database From Backup chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the cluster.
    3. If there is georeplication failure between multiple cnDBTier clusters in replication, follow the Restoring Georeplication Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the data.
  3. Run the below curl command to check whether the site replication is UP or not, once the fault recovery procedure completed:
    curl http://mysql-cluster-db-monitor-svc.occne-cndbtier:8080/db-tier/status/replication/realtime
  4. Run the below curl command on both the sites to check grstatus is COMPLETED or not:
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster1/status
    curl -X GET http://<svcname of replication svc>.<ns>/db-tier/gr-recovery/site/cluster2/status
  5. Install UDR. For more information, see Installing UDR Package.
7.4.4.2 Scenario 4B: All Sites Failure

This scenario applies when all sites have failed and there is a requirement to perform fault recovery. It is assumed that the user has DBTier and Oracle Communications UDR installed on multiple sites with automatic data replication and backup enabled.

To recover all the failed sites:

  1. Run the Cloud Native Environment (CNE) installation procedure to install a new cluster. For more information, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
  2. For cnDBTier fault recovery:
    1. To recover a single node failure in cnDBTier, follow the Restoring Single Node Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
    2. If a stand alone cnDBTier is down, follow the Restoring Database From Backup chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the cluster.
    3. If there is georeplication failure between multiple cnDBTier clusters in replication, follow the Restoring Georeplication Failure chapter in Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to restore the data.

    Note:

    • The auto-data backup file is one that built from scheduled automatic backup.
    • The Restore Georeplication Failure chapter contains a procedure for two sites where one of the cluster has fatal error. You can perform the same procedure for all the sites in a multiple site setup.
  3. Install UDR. For more information, see Installing UDR Package.

7.5 Manual Database Backup and Restore

This section describes how to backup and restore UDR database manually. It includes the following procedures:

7.5.1 Backing up the UDR Database

The UDR Database has both, configuration data and provisioned subscriber data in a single database. If there are different databases, then perform the following backup procedure per database.

Note:

  • UDR database has both configuration data and provisioned subscriber data in a single database. If there are multiple databases, then the following backup procedure must be performed per database.
  • The backup procedure may take longer time depending on the size of the database. Hence, restoring it may be stale. The user must consider this before performing these actions.
  • The commands used to take database backup and restore are provided by the MySQL cluster of DBTier.
  1. Note:

    This step is OPTIONAL. By running the uninstall command, the system ensures that there are no pending updates on UDR database. Alternatively, stop (disable) the provisioning updates on UDR and redirect the signalling traffic to another UDR database so that this particular UDR does not have newer updates at the time of data backup.
    Run the following command to ensure all the database transactions are committed.
    helm3 uninstall <deploymentName> -n <Namespace>

    where,

    <deploymentName> is the helm release name given to run helm3 install command.

    <Namespace> is the namespace, which is used to create UDR Kubernetes objects. All the UDR microservices are deployed in this Kubernetes namespace.

    Example: helm uninstall ocudr -n myudr

  2. Log in to the respective SQL node (or API node) and run the following command to take backup of UDR database and UDR configuration database.
    mysqldump --quick -h127.0.0.1 –u <username> -p <udr databasename>| gzip > <udr data backup_filename>.sql.gz
    mysqldump --quick -h127.0.0.1 –u <username> -p <udr configuration databasename>| gzip > <udr configuration data backup_filename>.sql.gz

    where,

    <username> is the MySQL login username.

    <databasename> is the name of the database that UDR is currently using.

    <backup_filename> is the user defined backup file name.

    Example:
    mysqldump --quick -h127.0.0.1 -uudruser -p udrdb | gzip > udrdbBackup.sql.gz
    mysqldump --quick -h127.0.0.1 -uudruser -p udrConfigDb | gzip > udrConfigDbBackup.sql.gz

    Note:

    Use the UDR database name in the command and enter the password when prompted.

    Note:

    There should be enough storage space on the current directory to save the backup file. Depending on the subscribers count, the backup file size may be large and it may take some time to take backup.

    The backup file, '<backup_filename>.sql.gz' gets created that is used to Restoring the UDR Database.

  3. Install UDR again (if required). For more information, see Installing UDR Package.

7.5.2 Restoring the UDR Database

To restore the UDR database:
  1. On the new database cluster (DBTier), log in to MySQL NDB Cluster's SQL node and create a new UDR database where the backed up database needs to be restored.

    Note:

    For more details about creating the database, see PreInstallation Tasks.
  2. Copy the <backup_ filename >.sql.gz file to new UDR database.

    Note:

    In case the back up file corrupts, take the UDR database backup again following the Backing up the UDR Database shared in the previous section.
  3. Run the following command to restore the database to the new UDR database and enter the password when it prompts.
    gunzip < <udrdb backup_filename>.sql.gz | mysql -h127.0.0.1 –u <username> -p < databaseName >
    gunzip < <udr configuration db backup_filename>.sql.gz | mysql -h127.0.0.1 –u <username> -p < databaseName >

    where, <backup_filename> is the user defined backup file name

    <username> is the MySQL login username

    < databaseName > is the name of the database that UDR is currently using

    Example:
    gunzip < udrdbBackup.sql.gz | mysql -h127.0.0.1 -uudruser -p newUdrdb
    gunzip < udrConfigDbBackup.sql.gz  | mysql -h127.0.0.1 -uudruser -p newUdrConfigurationDb
  4. Install UDR and connect it to the newly created database.
  5. Verify the sanity of UDR after connecting with the new database. For more information, see Post Installation Task.