6 Fault Recovery

This chapter describes the procedures to perform fault recovery for Oracle Communications Cloud Native Core, Network Exposure Function deployment.

6.1 Impacted Areas

The following table provides information about the impacted areas during NEF fault recovery:

Table 6-1 Impacted Areas

Scenario Requires Fault Recovery or Re-install of CNE Requires Fault Recovery or Re-install of cnDBTier Requires Fault Recovery or Re-install of NEF
Site Failure Yes Yes Yes
Migration to New Cluster Yes, if cluster is not present. Yes, if cnDBTier is not present. Yes
Database Corruption Yes, in case complete site is down. Yes, in case replication is not up. Yes
Deployment Corruption No No Yes

6.2 Prerequisites

Before performing any fault recovery procedure, ensure that the following prerequisites are met:

  • Perform the following steps to ensure that cnDBTier is in a healthy state:
    1. Run the following command to check if all the cnDBTier nodes are connected:
      ndb_mgm> show

      Example:

      ndb_mgm> show Connected to Management Server at: localhost:1186

      The output ensures that the nodes are connected.

    2. Run the following command to check the pod status:
      kubectl get pods -n <namespace>

      All the pods must be in Running state.

    3. Run the following command to check if the DB replication is up and running:
      mysql> show replica status\G

      The values for the Replica_IO_Running and Replica_SQL_Running fields must be "Yes".

      In case of any errors, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

  • Enable the automatic backup on cnDBTier by scheduling regular backups. The regular backups are help in fault recovery with the following tasks:
    • Restore stable version of the network function database
    • Minimize significant loss of data due to upgrades or roll back failures
    • Minimize loss of data due to system failure
    • Minimize loss of data due to data corruption or deletion due to external input
    • Migrate network function database information from one site to another
  • Ensure that the following NF specific files are available for fault recovery:
    • Custom values file (ocnef-custom-values-<release_number>)
    • Helm charts (ocnef-<release_number>.tgz)
    • Secrets and Certificates
    • RBAC resources