8 Fault Recovery

This chapter provides information about fault recovery for Oracle Communications Cloud Native Core, Converged Policy (Policy) deployment.

8.1 Overview

You must take database backup and restore it either on the same or a different cluster. It uses the Policy database (MySQL NDB Cluster) to run any command or to follow any instructions.

Database Model of CNC Policy

Policy database consists of the following two data types:
  • Configuration Data: The configuration data is exclusive for a given site. Thus, an exclusive logical database is created and used by a site to store its configuration data. Using CNC Console and Configuration Management service, you can configure the data in the respective site only.
  • Session Data: The session data is shared across sites. Thus, a common logical database is created and used by all sites. The data is replicated across sites to preserve and share session with mated sites. In case of cross sites messaging or a site failure, shared session data helps in continuity of service.

The following image shows the Policy database model in three different sites:

Figure 8-1 Database Model

Diagram to explain database model

8.2 Impacted Areas

The following table shares information about the impacted areas during Policy fault recovery:

Scenario Requires Fault Recovery or re-install of CNE? Requires Fault Recovery or re-install of cnDBTier? Requires Fault Recovery or re-install of Policy? Other
Scenario: Session Database Corruption No Yes

Restoring cnDBTier from older backup is the only way to restore back to restore point.

No

Only if cnDBTier credentials are changed.

All sites require Fault Recovery.
Scenario: Site Failure Yes Yes Yes NA

8.3 Prerequisites

Before performing any fault recovery procedure, ensure that the following prerequisites are met:

  • cnDBTier must be in a healthy state and available on multiple sites along with Policy. To check the cnDBTier status, perform the following steps:
    1. Run the following command to ensure that all the nodes are connected:

      ndb_mgm> show

    2. Run the following command to check the pod status:

      kubectl get pods -n <namespace>

      If the pod status is Running, then the cnDBTier is in healthy state.

    3. Run the following command to check if the replication is up:

      mysql> show slave status\G

      In case there is any error, see Fault Recovery chapter in Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide.

    4. Run the following command to check which cnDBTier is having ACTIVE replication to take backup:

      select * from replication_info.DBTIER_REPLICATION_CHANNEL_INFO;

  • Automatic backup must be enabled on cnDBTier. Enabling automatic backup helps in achieving the following:
    • Restore stable version of the network function database
    • Minimize significant loss of data due to upgrade or roll back failures
    • Minimize loss of data due to system failure
    • Minimize loss of data due to data corruption or deletion due to external input
    • Migrate database information for a network function from one site to another
  • The following files must be available for fault recovery:
    • Custom values file (occnp-custom-values-<release_number>)
    • Helm charts (occnp-<release_number>.tgz)
    • Secrets and Certificates
    • RBAC resources

Note:

For details on enabling automatic backup, see Fault Recovery section in Oracle Communications Cloud Native Core, cnDBTier (cnDBTier) Installation Guide.

8.4 Fault Recovery Scenarios

This section describes the fault recovery procedures for various scenarios.

Note:

This chapter describes scenario based procedures to restore Policy databases only. To restore all the databases that are part of cnDBTier, see Fault Recovery chapter in Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide available on My Oracle Support (MOS).

8.4.1 Scenario: Session Database Corruption

This section describes how to recover Policy when its session database corrupts.

When the session database corrupts, the database on all other sites can also corrupt due to data replication. It depends on the replication status after the corruption has occurred. If the data replication breaks due to database corruption, cnDBTier fails in either single or multiple sites (not all sites). And if the data replication is successful, database corruption replicates to all the cnDBTier sites and cnDBTier fails in all sites.

The fault recovery procedure covers following sub-scenarios:

8.4.1.1 When DBTier Failed in All Sites

This section describes how to recover session database when successful data replication corrupts all the cnDBTier sites.

To recover session database, perform the following steps:

  1. Uninstall Policy Helm charts on all sites. For more information about uninstalling Helm charts, see Oracle Communication Cloud Native Core, Converged Policy Installation and Upgrade Guide available on MOS.
  2. Perform cnDBTier fault recovery procedure:
    1. Use auto-data backup file for restore procedure. For more information about DBTier restore, see Fault Recovery chapter in Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide available on MOS.
  3. Install Policy Helm charts. For more information about installing Helm charts, see Oracle Communication Cloud Native Core, Converged Policy Installation, Upgrade and Fault Recovery Guide available on MOS.

    Note:

    You can also refer to the custom-values.yaml file used at the time of Policy installation for Helm charts installation.

8.4.2 Scenario: Site Failure

This section describes how to perform fault recovery when either one or many of your sites have software failure.

This section consists of the following:
8.4.2.1 Single or Multiple Site Failure
This scenario applies when one or more sites, and not all sites, have failed and there is a requirement to perform fault recovery. It is assumed that you have cnDBTier and Policy installed on multiple sites with automatic data replication and backup enabled.

Note:

It is assumed that one of the cnDBTier is in healthy state.
To recover the failed sites, perform the following steps:

Note:

Ensure that all the prerequisites mentioned are met.
  1. Uninstall Policy. For more information, see the Uninstalling CNC Policy section in Oracle Communications Cloud Native Core, Converged Policy Installation, Upgrade and Fault Recovery Guide.
  2. Install a new cluster by performing the Cloud Native Environment (CNE) installation procedure. For more information, see Oracle Communications Cloud Native Core, Cloud Native Environment (CNE) Installation and Upgrade Guide available on My Oracle Support.
  3. Install cnDBTier, in case replication is down or cnDBTier pods are not up and running. For information about installing cnDBTier, see Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide.
  4. Perform DBTier fault recovery procedure:
    1. Perform DBTier fault recovery procedure to take backup from older healthy site by following the Create On-demand Database Backup procedure in Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide.
    2. Restore the database to new site by following the Restore Database with Backup procedure in Oracle Communications Cloud Native Core, cnDBTier Installation and Upgrade Guide.
  5. Install Policy Helm charts. For more information on installing Helm charts, see Oracle Communications Cloud Native Core, Converged Policy Installation, Upgrade and Fault Recovery Guide.