Fault Recovery

7 Fault Recovery

This chapter describes the fault recovery procedures for various scenarios.

Note:

The fault recovery procedures can be used by Oracle customers as long as Oracle Customer Service personnel are involved or consulted.

7.1 Restoring Single Node Failure

This section provides fault recovery procedures for restoring a single MySQL Network Database (NDB) cluster node failure (Management node, Data node, appSQL (non-georeplication) node, or SQL (georeplication node) in Cloud Native Database Tier (cnDBTier) Environment.

Prerequisites

The fault recovery procedures described in this section can be applied to one of following scenarios, consider a MySQL NDB cluster with 3 Management nodes, 4 Data nodes (2 data node groups and each group has 2 data nodes) and 2 SQL nodes for example:
1. One of the management nodes fails, while others are all in a running state.
2. Two management nodes fail, while others are all in a running state.
3. One of the data nodes fails, while others are all in a running state.
4. One of the SQL (georeplication) or appSQL (non-georeplication) nodes fails, while others are all in a running state.
5. One management node, one data node, one SQL (georeplication) node, and one appSQL (non-georeplication) node fails, while others are all in a running state.
6. One management node, one Data node in each of the Data node group, and one SQL (georeplication) or appSQL (non-georeplication) node fails, while others are all in a running state.
If the failure falls into either of the above scenarios, restore the failed database node one by one using procedure defined in this section as per their database node type.
Ensure that the Oracle Communication Cloud Native Environment (OCCNE) cluster has a running Bastion Host for MySQL NDB cluster restoration.

7.1.1 Restoring Single Database Node Failure

Following is the fault recovery procedure for restoring a single Cloud Native Database Tier (cnDBTier) cluster node failure (Management node, Data node, SQL (georeplication) node, or appSQL (non-georeplication node) in the cnDBTier environment.

In a cnDBTier deployment, the OCCNE Kubernetes service automatically restores the single node. For example, if a single data node pod is deleted, assume that the Persistent Volume Claim (PVC) mounted to it is not corrupted. The Horizontal Pod Autoscaler (HPA) named 'horizontalpodautoscaler.autoscaling/ndbmtd' automatically recreates the data node pod and nothing needs to be done manually to recover. The completion of recovery procedure takes some time. Verify its successful completion by invoking 'ndb_mgm' utility on the management node. Verify that all nodes are listed with no error message:

$ kubectl -n Cluster1 exec -ti pod/ndbmgmd-0 -c mysqlndbcluster /bin/bash

Run the following command to verify that all nodes are listed with no error message:

-- NDB Cluster -- Management Client --
ndb_mgm> show

Sample output:

-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.116.101  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.70.106  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.96.102  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
id=4    @10.233.119.205  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.77.54  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.112.126  (mysql-8.4.2 ndb-8.4.2)
 
[mysqld(API)]   8 node(s)
id=70   @10.233.90.189  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.71.30  (mysql-8.4.2 ndb-8.4.2)
id=72   @10.233.118.58  (mysql-8.4.2 ndb-8.4.2)
id=73   @10.233.100.65  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Note:

Node IDs 222 to 225 in the sample output are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs.

If the PVC mounted to a failed node is corrupted, delete the PVC and the pod. Kubernetes automatically recreates the PVC and the pod.

Note:

For appSQL nodes that are not in georeplication, follow the procedure till Step3.

The following recovery scenario is an example explaining a data node pod failure and its corrupted PVC:

Delete the corrupted PVC of the node that must be restored, for example, the corrupted PVC belongs to the second data node pod:

Note:
Since the pod is not yet deleted, deleting the PVC using the below command freezes the session. To avoid this, open another terminal and continue the following steps.
```
$ kubectl -n <namespace> delete pvc pvc-ndbmtd-ndbmtd-<id, could be 0, 1, 2, 3>
```
Where <namespace> is the failed data node.
```
$ kubectl -n Cluster1 delete pvc pvc-ndbmtd-ndbmtd-1
```
Sample output:
```
persistentvolumeclaim "pvc-ndbmtd-ndbmtd-1" deleted
```

Delete the corrupted pod of the node that must be restored, for example, the second data node pod:

$ kubectl -n <namespace> delete pod/ndbmtd-<id, could be 0, 1, 2, 3>

For example:

$ kubectl -n Cluster1 delete pod/ndbmtd-1

Sample output:

pod "ndbmtd-1" deleted

This step may take some time to complete. Once the process exits with no error, the pod is in pending state and its PVC is not created automatically. This is because the HPA tries to create the pod without recreating the PVC.

$ kubectl get pod -n Cluster1

Sample output:

NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/mysql-cluster-Cluster1-Cluster2-replication-svc-7b5cb67c9fqd7b4   1/1     Running   0          36m
pod/mysql-cluster-Cluster1-Cluster3-replication-svc-86445d8cb42pz5x   1/1     Running   0          2m9s
pod/mysql-cluster-db-monitor-svc-57d688947-bbqp7                      1/1     Running   1          69m
pod/ndbmgmd-0                                                         2/2     Running   0          69m
pod/ndbmgmd-1                                                         2/2     Running   0          68m
pod/ndbmtd-0                                                          3/3     Running   0          69m
pod/ndbmtd-1                                                          3/3     Running   0          68m
pod/ndbmysqld-0                                                       3/3     Running   0          69m
pod/ndbmysqld-1                                                       3/3     Running   0          2m9s
pod/ndbmysqld-2                                                       3/3     Running   0          68m
pod/ndbmysqld-3                                                       3/3     Running   0          68m

Delete the pod that you want to restore. Once you delete the pod, Kubernetes brings it up successfully:
```
$ kubectl -n <namespace> delete pod/ndbmtd-<id, could be 0, 1, 2, 3>
```
For example:
```
$ kubectl -n Cluster1 delete pod/ndbmtd-1
```
Sample output:
```
pod "ndbmtd-1" deleted
```

Wait until the pod is up and running and verigy the status of the pod:

$ kubectl get pod -n Cluster1

Sample output:

NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/mysql-cluster-Cluster1-Cluster2-replication-svc-7b5cb67c9fqd7b4   1/1     Running   0          36m
pod/mysql-cluster-Cluster1-Cluster3-replication-svc-86445d8cb42pz5x   1/1     Running   0          2m9s
pod/mysql-cluster-db-monitor-svc-57d688947-bbqp7                      1/1     Running   1          69m
pod/ndbmgmd-0                                                         2/2     Running   0          69m
pod/ndbmgmd-1                                                         2/2     Running   0          68m
pod/ndbmtd-0                                                          3/3     Running   0          69m
pod/ndbmtd-1                                                          3/3     Running   0          68m
pod/ndbmysqld-0                                                       3/3     Running   0          69m
pod/ndbmysqld-1                                                       3/3     Running   0          2m9s
pod/ndbmysqld-2                                                       3/3     Running   0          68m
pod/ndbmysqld-3                                                       3/3     Running   0          68m

Perform the following to restore a SQL node after failure:

If the restored SQL node was involved in active replication, then try switching from the original standby SQL node to a newly restored SQL node that is on standby now, and verify that the replication channel can be successfully established.

Stop replication on original standby SQL node so that switch over of active replication channel happens to newly restored SQL node:
```
$ kubectl -n Cluster1 exec -it ndbmysqld-1 -- mysql -h 127.0.0.1 -u<username> -p<password> -e "STOP REPLICA;"
```
For example:
```
kubectl -n Cluster1 exec -it ndbmysqld-1 -- mysql -h 127.0.0.1 -uroot -pNextGenCne -e "STOP REPLICA;"
```

Verify active replication on the newly restored SQL node:

$ kubectl -n Cluster1 exec -it ndbmysqld-0 -- mysql -h127.0.0.1 -u<username> -p<password> -e "SHOW REPLICA STATUS\G"

Example:

kubectl -n Cluster1 exec -it ndbmysqld-0 -- mysql -h127.0.0.1 -uroot -pNextGenCne -e "SHOW REPLICA STATUS\G"

Sample output:

Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 10.75.180.77
                  Source_User: occnerepluser
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: mysql-bin.000005
          Read_Source_Log_Pos: 62625
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 2191
        Relay_Source_Log_File: mysql-bin.000005
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 62625
              Relay_Log_Space: 2401
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File:
           Source_SSL_CA_Path:
              Source_SSL_Cert:
            Source_SSL_Cipher:
               Source_SSL_Key:
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Source_Server_Id: 2000
                  Source_UUID: 89219509-8fce-11ec-89b7-56d8e6d44947
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Source_Retry_Count: 86400
                  Source_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Source_SSL_Crl:
           Source_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Source_TLS_Version:
       Source_public_key_path:
        Get_Source_public_key: 0
            Network_Namespace:

Note:

For management nodes or SQL nodes, follow this procedure by replacing every occurrence of 'ndbmtd' in the commands with 'ndbmgmd' or 'ndbmysqld' respectively.

7.1.2 Recovering a Single Node Replication Service

This section provides the procedure to recover a single node replication service in case of PVC corruption.

Perform the following steps to scale down the replication service pod:

Run the following command to get the list of deployments in cluster1:

$ kubectl get deployment --namespace=cluster1

Sample output:

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
mysql-cluster-cluster1-cluster2-replication-svc   1/1     1            1           11m
mysql-cluster-db-backup-manager-svc               1/1     1            1           11m
mysql-cluster-db-monitor-svc                      1/1     1            1           11m

Scale down the replication service of cluster1 with respect to cluster2:

$ kubectl scale deployment mysql-cluster-cluster1-cluster2-replication-svc --namespace=cluster1 --replicas=0

Sample output:

deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled

Perform the following steps to delete the corrupted PVC of the db-replication-svc pod:

Run the following command to get the corrupted PVC:

$ kubectl get pvc -n cluster1

Sample output:

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pvc-cluster1-cluster2-replication-svc   Bound    pvc-4f1e7afa-724d-470a-b1a5-dbe1257e7a48   8Gi        RWO            occne-dbtier-sc   12m
pvc-ndbappmysqld-ndbappmysqld-0         Bound    pvc-307f3d29-01d8-48a8-b580-15db8edc1121   2Gi        RWO            occne-dbtier-sc   12m
pvc-ndbappmysqld-ndbappmysqld-1         Bound    pvc-358b8d50-b53c-427f-87a3-ac9c139581b6   2Gi        RWO            occne-dbtier-sc   10m
pvc-ndbmgmd-ndbmgmd-0                   Bound    pvc-965061dd-3d88-40fe-bf25-4260a61d0fa4   1Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmgmd-ndbmgmd-1                   Bound    pvc-8601a9b3-769c-41ca-aeb8-c5b5725693e3   1Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmtd-ndbmtd-0                     Bound    pvc-8bfab5e0-4a24-432f-98a4-a64080ccb33c   3Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmtd-ndbmtd-1                     Bound    pvc-196f7e87-824c-46d0-b0c9-8aad56806a34   3Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmysqld-ndbmysqld-0               Bound    pvc-6ea3cd11-8cbe-4d2a-864d-baf7d22f4295   2Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmysqld-ndbmysqld-1               Bound    pvc-c978f6de-0477-4a68-8ebe-3bcb899d27bb   2Gi        RWO            occne-dbtier-sc   10m

Run the following command to delete the corrupted PVC:

$ kubectl delete pvc pvc-cluster1-cluster2-replication-svc -n cluster1

Sample output:

persistentvolumeclaim "pvc-cluster1-cluster2-replication-svc" deleted

Perform the following steps to check if the corrupted PVC is deleted:

Get the list of deployments in cluster 1 and verify that the deleted PVC is not present in the output:

$ kubectl get pvc -n cluster1

Sample output:

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pvc-ndbappmysqld-ndbappmysqld-0   Bound    pvc-307f3d29-01d8-48a8-b580-15db8edc1121   2Gi        RWO            occne-dbtier-sc   12m
pvc-ndbappmysqld-ndbappmysqld-1   Bound    pvc-358b8d50-b53c-427f-87a3-ac9c139581b6   2Gi        RWO            occne-dbtier-sc   11m
pvc-ndbmgmd-ndbmgmd-0             Bound    pvc-965061dd-3d88-40fe-bf25-4260a61d0fa4   1Gi        RWO            occne-dbtier-sc   12m
pvc-ndbmgmd-ndbmgmd-1             Bound    pvc-8601a9b3-769c-41ca-aeb8-c5b5725693e3   1Gi        RWO            occne-dbtier-sc   12m
pvc-ndbmtd-ndbmtd-0               Bound    pvc-8bfab5e0-4a24-432f-98a4-a64080ccb33c   3Gi        RWO            occne-dbtier-sc   12m
pvc-ndbmtd-ndbmtd-1               Bound    pvc-196f7e87-824c-46d0-b0c9-8aad56806a34   3Gi        RWO            occne-dbtier-sc   12m
pvc-ndbmysqld-ndbmysqld-0         Bound    pvc-6ea3cd11-8cbe-4d2a-864d-baf7d22f4295   2Gi        RWO            occne-dbtier-sc   12m
pvc-ndbmysqld-ndbmysqld-1         Bound    pvc-c978f6de-0477-4a68-8ebe-3bcb899d27bb   2Gi        RWO            occne-dbtier-sc   10m

Run the following command to get the PV details and verify that the PV associated with the corrupted PVC is not present in the output:
```
$ kubectl get pv |grep cluster1 | grep repl
```
Sample output:
```
pvc-ecc6d691-ca31-41c3-930c-c092d73452e8   8Gi        RWO            Delete           Bound    cluster2/pvc-cluster2-cluster1-replication-svc   occne-dbtier-sc            10m
```

Run the following command to upgrade cnDBTier with the modified custom_values.yaml file:
```
$ helm upgrade mysql-cluster occndbtier -f occndbtier/custom_values.yaml -n cluster1 
```
When the upgrade is complete, the new db-replication-svc pod comes up with the new PVC.

Perform a Helm test to ensure that all the cnDBTier services are running smoothly:

$ helm test mysql-cluster -n ${OCCNE_NAMESPACE}

Sample output:

NAME: mysql-cluster
LAST DEPLOYED:  Mon May 20 09:07:56 2024
NAMESPACE: cluster1
STATUS: deployed
REVISION: 2
TEST SUITE:     mysql-cluster-node-connection-test
Last Started:    Mon May 20 09:15:39 2024
Last Completed:  Mon May 20 09:16:16 2024
Phase:          Succeeded

Check the status of cnDBTier Cluster1 by running the following command:

$ kubectl -n ${OCCNE_NAMESPACE} exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, db-infra-monitor-svc
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=1    @10.233.74.160  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.79.154  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.78.169  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.102.220  (mysql-8.4.2 ndb-8.4.2)

[mysqld(API)]   8 node(s)
id=56   @10.233.72.151  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.84.206  (mysql-8.4.2 ndb-8.4.2)
id=70   @10.233.79.153  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.73.138  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

7.2 Restoring Database From Backup

This chapter provides procedure to restore the database from backup stored in ndb_restore utility.

7.2.1 Restoring Database from Backup with ndb_restore

This procedure restores the database nodes of a cnDBTier cluster from backup with MySQL ndb_restore utility.

Note:

To restore the single site cnDBTier cluster deployment, download the NDB backup to the Bastion Host. If the backup is already downloaded in Bastion Host, then you can use the same backup for restoring the cnDBtier cluster in case of fatal errors.
The cnDBTier backup that is used for the restore must be taken using the same cnDBTier version as that of the cnDBTier site that is being restored.
For restoring the failed cnDBTier clusters in two site, three site, and four site cnDBTier deployment models, perform the procedures as described in Restoring Georeplication (GR) Failure section.
db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.
You can locate the scripts used in this section in the following location: <path where CSAR package of cnDBTier is extracted>/Artifacts/Scripts/dr-procedure.

Downloading the Latest DB Backup Before Restoration

Create a database backup of cnDBTier Cluster1. If the cnDBTier is UP, copy the backup directories and files to a local directory in your Bastion Host. For more information on how to create a backup, see Creating On-demand Database Backup.

Check if the backup is already existing in the cnDBTier cluster that needs to be downloaded to the Bastion Host:

$ kubectl -n cluster1 exec -it ndbmtd-0 -- ls -lrt /var/ndbbackup/dbback/BACKUP

Example:

$ kubectl -n cluster1 exec -it ndbmtd-0 -- ls -lrt /var/ndbbackup/dbback/BACKUP

Sample output:

Defaulting container name to mysqlndbcluster.
Use 'kubectl describe pod/ndbmtd-0 -n cluster1' to see all of the containers in this pod.
total 8
drwxr-sr-x. 6 mysql mysql 4096 Feb 19 17:50 BACKUP-217221233
drwxr-s---. 6 mysql mysql 4096 Feb 19 18:33 BACKUP-217221332

Download the backup from the cnDBTier cluster data nodes to the Bastion Host.

Run the following command to download the backup files to the Bastion Host and compress the files as a single tar ball file. Select the backup ID from step 1 that needs to be downloaded.

$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=<namespace of cndbtier cluster on which backup is created> BACKUP_DIR=<path where backup files are copied> DATA_NODE_COUNT=<ndb data node count> BACKUP_ID=<backup id> BACKUP_ENCRYPTION_ENABLE=false ./download_backup.sh <backup tar ball path>

Example:

$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=cluster1 BACKUP_DIR=/var/ndbbackup DATA_NODE_COUNT=4 BACKUP_ID=217221233 BACKUP_ENCRYPTION_ENABLE=false ./download_backup.sh backup_217221233.tar.gz

Sample output:

Defaulting container name to mysqlndbcluster.
tar: Removing leading `/' from member names
Defaulting container name to mysqlndbcluster.
tar: Removing leading `/' from member names
Defaulting container name to mysqlndbcluster.
tar: Removing leading `/' from member names
Defaulting container name to mysqlndbcluster.
tar: Removing leading `/' from member names
./
./ndbmtd-0/
./ndbmtd-0/BACKUP-217221233/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-1/
./ndbmtd-1/BACKUP-217221233/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-2/
./ndbmtd-2/BACKUP-217221233/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-3/
./ndbmtd-3/BACKUP-217221233/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.ctl
Cluster backup 217221233 downloaded and compressed to backup_217221233.tar.gz successfully.

Note down the location of the backup tar file and backup ID. This path is required later for database restoration.

Restoring Database Schema and Tables

Note:

Ensure that the cnDBTier backup that is used for the restore is taken using the same cnDBTier version as that of the cnDBTier site that is being restored.
You can ignore the temporary errors observed in the NDB database that is restored if the restore completes successfully.

Log in to the Bastion Host of the cnDBTier cluster to restore the database.
Check the backup tar file downloaded in the previous section, the backup file must be available before restoring of the database.

If the backup is fetched from a remote server, then perform the following steps to convert the backup file from zip format to the format required by the restore script:

Copy the backup zip file to the folder containing the restore script.

Run the following command to unzip the backup file:

$ unzip backup_<backup_id>_<Encrypted/Unencrypted>.zip

For example:

$ unzip backup_708241812_Encrypted.zip

Sample output:

Archive:  backup_708241812_Encrypted.zip
  inflating: backup_708241812_dn_1.tar.gz
  inflating: backup_708241812_dn_1.tar.gz.sha256
  inflating: backup_708241812_dn_2.tar.gz
  inflating: backup_708241812_dn_2.tar.gz.sha256

Untar the tar file that belongs to the first data node (the file name containing dn_1 is the file for the first data node):

$ tar -xzvf backup_<backup_id>_dn_1.tar.gz -C ./

For example:

$ tar -xzvf backup_708241812_dn_1.tar.gz -C ./

Sample output:

BACKUP-708241812/
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.ctl
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.1.Data
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.log
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.ctl
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.1.Data
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.log
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.ctl
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.1.Data
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.log
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.ctl
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.1.Data
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.log

Create a directory named ndbmtd-0 and move the folder extracted in the previous step to the ndbmtd-0 folder:
```
$ mkdir ndbmtd-0
$ mv BACKUP-<backup_id> ./ndbmtd-0/
```
For example:
```
$ mkdir ndbmtd-0
$ mv BACKUP-708241812 ./ndbmtd-0/
```

Untar the tar file that belongs to the second data node (the file name containing dn_2 is the file for the second data node):

$ tar -xzvf backup_<backup_id>_dn_2.tar.gz -C ./

For example:

$ tar -xzvf backup_708241812_dn_2.tar.gz -C ./

Sample output:

BACKUP-708241812/
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.log
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.ctl
BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.2.Data
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.log
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.ctl
BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.2.Data
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.log
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.ctl
BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.2.Data
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.log
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.ctl
BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.2.Data

Create a directory named ndbmtd-1 and move the folder extracted in the previous step to the ndbmtd-1 folder:
```
$ mkdir ndbmtd-1
$ mv BACKUP-<backup_id> ./ndbmtd-1/
```
For example:
```
$ mkdir ndbmtd-1
$ mv BACKUP-708241812 ./ndbmtd-1/
```
Repeat steps e and f for the remaining tar files of respective data nodes.

Create a tar file to contain all the folders created in the previous steps:

$ tar -czvf backup_<backup_id>.tar.gz ndbmtd-*

For example:

tar -czvf backup_708241812.tar.gz ndbmtd-*

Sample output:

ndbmtd-0/
ndbmtd-0/BACKUP-708241812/
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.ctl
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.1.Data
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.log
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.ctl
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.1.Data
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.log
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.ctl
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.1.Data
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.log
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.ctl
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.1.Data
ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.log
ndbmtd-1/
ndbmtd-1/BACKUP-708241812/
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.log
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.ctl
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.2.Data
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.log
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.ctl
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.2.Data
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.log
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.ctl
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.2.Data
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.log
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.ctl
ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.2.Data

Use the created file (backup_<backup_id>.tar.gz) in the following steps to perform a restore.

Reinstall the cnDBTier cluster, if the cluster fails due to fatal errors. For more information about installing the cnDBTier cluster, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.

Disable cnDBTier replication services on cnDBTier cluster if exists. For single site deployment, this service is disabled.

$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=0

Example:

$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0

Sample output:

deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled

Run the following command to disable DB backup manager service in cnDBTier cluster:

$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0

Example:

$ kubectl -n cluster1 get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0

Sample output:

deployment.apps/mysql-cluster-db-backup-manager-svc scaled

Wait until cnDBTier cluster is up and running. Check the status of the cnDBTier cluster by running the following commands:

$ kubectl -n <namespace of cnDBTier Cluster> get pods
$ kubectl -n <namespace of cnDBTier Cluster> exec -it ndbmgmd-0 -- ndb_mgm -e show

Example: Checking the cluster status by accessing the pods running in the cluster

$ kubectl -n cluster1 get pods

Sample output:

NAME                                                              READY   STATUS    RESTARTS   AGE
mysql-cluster-db-monitor-svc-777dc5d7d7-pmzxz                     1/1     Running   0          6h19m
ndbappmysqld-0                                                    2/2     Running   0          6h19m
ndbappmysqld-1                                                    2/2     Running   0          6h14m
ndbmgmd-0                                                         2/2     Running   0          6h19m
ndbmgmd-1                                                         2/2     Running   0          6h18m
ndbmtd-0                                                          3/3     Running   0          6h19m
ndbmtd-1                                                          3/3     Running   0          6h17m
ndbmtd-2                                                          3/3     Running   0          6h17m
ndbmtd-3                                                          3/3     Running   0          6h16m
ndbmysqld-0                                                       3/3     Running   0          6h19m
ndbmysqld-1                                                       3/3     Running   0          6h13m

Example: Checking the status of cluster from the management pod

$ kubectl -n cluster1 exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.116.101  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.70.106  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.96.102  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
id=4    @10.233.119.205  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.77.54  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.112.126  (mysql-8.4.2 ndb-8.4.2)
 
[mysqld(API)]   8 node(s)
id=56   @10.233.90.189  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.71.30  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.118.58  (mysql-8.4.2 ndb-8.4.2)
id=72   @10.233.100.65  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Note:

Node IDs 222 to 225 in the sample output are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs.

Perform the following steps to restore NDB database automatically:

Run the following command to pick a backup to restore the NDB cluster:

$  SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=<namespace> BACKUP_DIR=<backup_DIR> BACKUP_ID=<backup_ID> BACKUP_ENCRYPTION_ENABLE=<true/false> BACKUP_ENCRYPTION_PASSWORD=<backup encryption password> ./cndbtier_restore.sh <backup_tar_path>

where:

<namespace>: is the namespace of cnDBTier cluster to restore
<backup_DIR>: is the path where backup files are copied
<backup_ID>: is the backup ID obtained in Downloading Latest DB Backup Before Restoration
<backup encryption password>: if BACKUP_ENCRYPTION_ENABLE is set to true, then this variable gives the backup encryption password
<backup_tar_path>: is the backup tar ball path generated in Downloading Latest DB Backup Before Restoration

For example:

$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=cluster1 BACKUP_DIR="/var/ndbbackup" BACKUP_ID=217221233 BACKUP_ENCRYPTION_ENABLE=true BACKUP_ENCRYPTION_PASSWORD="NextGenCne" ./cndbtier_restore.sh backup_217221233.tar.gz

Sample output:

Extracting backup part files to /tmp/.cndbtier-backup-MVkgJJperv ...
./
./ndbmtd-0/
./ndbmtd-0/BACKUP-217221233/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.log
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.1.Data
./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.ctl
./ndbmtd-1/
./ndbmtd-1/BACKUP-217221233/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.log
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.2.Data
./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.ctl
./ndbmtd-2/
./ndbmtd-2/BACKUP-217221233/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.ctl
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.log
./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.3.Data
./ndbmtd-3/
./ndbmtd-3/BACKUP-217221233/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.ctl
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.4.Data
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.log
./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.ctl
Defaulting container name to mysqlndbcluster.
Defaulting container name to mysqlndbcluster.
Defaulting container name to mysqlndbcluster.
Defaulting container name to mysqlndbcluster.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Nodeid = 1
Backup Id = 217221233
backup path = /var/ndbbackup/dbback/BACKUP/BACKUP-217221233
Found backup 217221233 with 4 backup parts
................
................
................
................
................
................
2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes
2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes
2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes
2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes
Rebuilding index `PRIMARY` on table `DBTIER_REPLICATION_CHANNEL_INFO` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_COMMAND_QUEUE` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_INFO` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_REPL_SITE_INFO` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_SITE_INFO` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_INITIAL_BINLOG_POSTION` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_TRANSFER_INFO` ...
OK (0 s)
Rebuilding index `PRIMARY` on table `City` ...
OK (0 s)
Create foreign keys
Create foreign keys done
[INFO]: Value of ndbmysqldcount: 2.
statefulset.apps/ndbmysqld scaled
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndbmysqld pods are up.
[INFO]: Wait till all ndb pods are up.
[INFO]: Scaled up all replication ndbmysqld pods.
pod "ndbappmysqld-0" deleted
pod "ndbappmysqld-1" deleted
Cluster restore completed.

If user accounts do not exist, create the required NF-specific user accounts and grants to match NF users and grants of the good site in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.

Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.

Clean up backup_info database tables and mysql.ndb_apply_status table, once the restore is complete:

$ kubectl -n  <namespace of cnDBTier Cluster> exec -it ndbmysqld-0 -- mysql -h 127.0.0.1 -uroot -p

Sample output:


Enter Password:

$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_COMMAND_QUEUE;
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_INFO;
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_TRANSFER_INFO;
$ mysql> DELETE FROM mysql.ndb_apply_status;  DELETE FROM replication_info.DBTIER_INITIAL_BINLOG_POSTION;

When georeplication failure occurs on all sites, clean up the replication_info database tables as the backup is taken from other clusters and the backup can be either a multi-channel or single-channel backup:

$ mysql> DELETE FROM replication_info.DBTIER_REPLICATION_CHANNEL_INFO
$ mysql> DELETE FROM replication_info.DBTIER_REPL_ERROR_SKIP_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_EVENT_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_SITE_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_SVC_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_SITE_INFO;

Example to remove all the entries from the backup_info.DBTIER_BACKUP_INFO table and mysql.ndb_apply_status table:

 $ kubectl -n  cluster1 exec -it ndbmysqld-0 -- mysql -h 127.0.0.1 -uroot -p
Enter Password:
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_COMMAND_QUEUE;
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_INFO;
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_TRANSFER_INFO;
$ mysql> DELETE FROM mysql.ndb_apply_status;

Example to clean up the replication_info database tables

$ mysql> DELETE FROM replication_info.DBTIER_REPLICATION_CHANNEL_INFO
$ mysql> DELETE FROM replication_info.DBTIER_REPL_ERROR_SKIP_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_EVENT_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_SITE_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_REPL_SVC_INFO;
$ mysql> DELETE FROM replication_info.DBTIER_SITE_INFO;

Reenable the cnDBTier replication service in cnDBTier Cluster:

$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1

Example:

$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1

Sample output:

deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled

$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1

For example:

$ kubectl -n cluster1 get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1

Sample output:

deployment.apps/mysql-cluster-db-backup-manager-svc scaled

7.3 Creating On-demand Database Backup

This section provides the procedure to create on-demand database backup using cnDBTier backup service.

Prerequisites

The cnDBTier cluster must be in a healthy state, that is, every database node must be in the Running state.

Run the following procedure to create on-demand database backup:

Log in to the bastion host of the cnDBTier cluster.
Perform the following to create the on-demand backup:
1. Run the following command to get the db-backup-manager-svc pod name from cnDBTier cluster:
```
$ kubectl -n <namespace of cnDBTier cluster> get pods | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  For example:
```
$ kubectl -n cluster1 get pods | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  Sample output:
```
mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb
```
2. Run the following command to get the db-backup-manager-svc service name from cnDBTier cluster:
```
$ kubectl -n <namespace of cnDBTier cluster> get svc | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  For example:
```
kubectl -n cluster1 get svc | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  Sample output:
```
mysql-cluster-db-backup-manager-svc
```
3. Log in to the db-backup-manager-svc pod from cnDBTier cluster:
```
$ kubectl -n <namespace of cnDBTier cluster> exec -it <db-backup-manager-svc pod> -- bash
```
  For example:
```
$ kubectl -n cluster1 exec -it mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb -- bash
```
4. Run the following REST API CURL command for creating the on-demand backup:
```
$ curl -X POST http://<db-backup-manager-svc svc>:8080/db-tier/on-demand/backup/initiate
```
  For example:
```
$ curl -X POST http://mysql-cluster-db-backup-manager-svc:8080/db-tier/on-demand/backup/initiate
```
  Sample output:
```
{"backup_encryption_flag":"True","backup_id":"410231044","status":"BACKUP_INITIATED"}
```
Note down the latest backup_id in the output (for example, 216220937 in this case) which is required later for Network Database (NDB) cluster restoration.
Perform the following to verify the status of the on-demand backup:
1. Run the following command to get the db-backup-manager-svc pod name from cnDBTier cluster:
```
$ kubectl -n <namespace of cnDBTier cluster> get pods | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  For example:
```
$ kubectl -n cluster1 get pods | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  Sample output:
```
mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb
```
2. Run the following command to get the db-backup-manager-svc service name from cnDBTier cluster:
```
$ kubectl -n <namespace of cnDBTier cluster> get svc | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  For example:
```
kubectl -n cluster1 get svc | grep "db-backup-manager-svc" | awk  '{print $1}'
```
  Sample output:
```
mysql-cluster-db-backup-manager-svc
```
3. Log in to the db-backup-manager-svc pod:
```
$ kubectl -n <namespace of cnDBTier cluster> exec -it <db-backup-manager-svc pod> -- bash
```
  For example:
```
$ kubectl -n cluster1 exec -it mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb -- bash
```
4. Run the following cURL command to verify the status of the on-demand backup:
  Replace BACKUP_ID_INITIATED in the following command with the backup ID retrieved from step 2d.
```
$ curl -X GET http://<db-backup-manager-svc svc>:8080/db-tier/on-demand/backup/<BACKUP_ID_INITIATED>/status
```
  For example:
```
$ curl -X GET http://mysql-cluster-db-backup-manager-svc:8080/db-tier/on-demand/backup/216220937/status
```
  Sample output:
```
{"backup_status":"BACKUP_COMPLETED", "transfer_status":"BACKUP_TRANSFER_REQUESTED"}
```
  Note:
  The "transfer_status" parameter in the response payload indicates the backup transfer status from db-backup-manager-svc to db-replication-svc and not to the remote server.

7.4 Restoring Georeplication (GR) Failure

This section provides the procedures to restore georeplication failures in cnDBTier clusters using cnDBTier fault recovery APIs or CNC Console.

You can perform the georeplication recovery procedures on a two-cluster, three-cluster, or four-cluster setups to recover georeplication failures in the following scenarios:

Georeplication failure between healthy clusters when binlog entries (database commits) are not replicated between clusters due to network outage or latency.
Georeplication failure between the clusters when one or more clusters have fatal error and needs to be reinstalled.

Choose the restore procedure depending on your requirement.

Note:

All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.
You cannot restore empty databases.

This section uses the following terms:

cnDBTier Cluster1, the first cluster in two-cluster, three-cluster, or four-cluster georeplication setup.
cnDBTier Cluster2, the second cluster in two-cluster, three-cluster, or four-cluster georeplication setup.
cnDBTier Cluster3, the third cluster in three-cluster or four-cluster georeplication setup.
cnDBTier Cluster4, the fourth cluster in four-cluster georeplication setup.
Bastion Host, the host that is used for installing cnDBTier clusters where kubectl and helm are installed and configured to access the kubernetes cluster and helm repository.

Georeplication Channels Between cnDBTier Clusters

The following table shows the sample cnDBTier cluster names, and the corresponding namespace and login credentials. Before starting the fault recovery procedures, update the following table with the actual values for all the cnDBTier clusters to refer while recovering the failed cnDBTier clusters.

Table 7-1 Cluster Details

cnDBTier Cluster	Cluster Name	Namespace	MySQL Root Password
cnDBTier cluster1	Cluster1	Cluster1	SamplePassword
cnDBTier cluster2	Cluster2	Cluster2	SamplePassword
cnDBTier cluster3	Cluster3	Cluster3	SamplePassword
cnDBTier cluster4	Cluster4	Cluster4	SamplePassword

7.4.1 Restoring GR Failures Using cnDBTier GRR APIs

This chapter describes the procedures to restore Georeplication (GR) failures using cnDBTier Georeplication Recovery (GRR) APIs. For more information about the GRR APIs, see Fault Recovery APIs.

7.4.1.1 Two-Cluster Georeplication Failure

This section provides the procedures to recover georeplication failure in two-cluster georeplication deployments.

7.4.1.1.1 Resolving GR Failure Between cnDBTier Clusters in a Two-Cluster Replication

This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a two-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to other cnDBTier cluster or has some fatal errors which require to be reinstalled.
All cnDBTier clusters (cnDBTier cluster1 and cnDBTier cluster2) are in a healthy state, that is, all database nodes (including management node, data node, and API node) are in a Running state if their is only a replication delay and no fatal errors exist in the cnDBTier cluster which needs to be restored.
NF or application traffic is diverted from the failed cnDBTier Cluster to the working cnDBTier Cluster.
CURL is installed in the environment from where commands are run.

Note:

All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.

To resolve this georeplication failure, restore cnDBTier Cluster to the latest DB backup using automated backup and restore and then re-establish the replication channels.

Check the status of the cnDBTier cluster status in both cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1 and cnDBTier cluster2.
Run the following commands to mark gr_state of the failed cnDBTier cluster as FAILED in cnDBTier clusters that are accessible and are in one of the healthy cnDBTier clusters:
1. Run the following command to get the replication service LoadBalancer IP for the cluster that must be marked as failed:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
2. Run the following command to get the replication service LoadBalancer Port for the cluster that must be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
3. Mark the cnDBTier cluster as failed in the failed cnDBTier cluster if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
```
4. Mark the cnDBTier cluster as failed in one of the other healthy remote cnDBTier cluster:
  1. Get the replication service LoadBalancer IP for the healthy cluster:
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer port of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
  3. Mark the cnDBTier cluster as failed in one of the healthy cnDBTier clusters. HTTP response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of the failed cluster>/failed
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
For example:
- If cnDBTier cluster1 is failed, mark the cnDBTier cluster as failed in the unhealthy cnDBTier cluster cluster1:
  1. Get the replication service LoadBalancer IP of cluster1:
```
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
```
  2. Get the replication service LoadBalancer Port of cluster1:
```
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Run the following command to mark the cnDBTier cluster cluster1 as failed in unhealthy cnDBTier cluster cluster1 if it is accessible:
```
$ curl -i  -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
```
  4. Mark the cnDBTier cluster1 as failed in one of the other healthy remote cnDBTier cluster2:
    1. Get the replication service LoadBalancer IP for the healthy cluster.
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for the healthy cluster:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster1 as failed in the healthy cnDBTier cluster cluster2. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- If cnDBTier cluster2 is failed, mark the cnDBTier cluster as failed in the unhealthy cnDBTier cluster cluster2:
  1. Get the replication service LoadBalancer IP of cluster2:
```
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
```
  2. Get the replication service LoadBalancer Port of cluster2:
```
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Run the following command to mark the cnDBTier cluster cluster2 as failed in unhealthy cnDBTier cluster cluster2 if it is accessible:
```
$ curl -i  -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
```
  4. Mark the cnDBTier cluster2 as failed in one of the other healthy remote cnDBTier cluster1:
    1. Get the replication service LoadBalancer IP for the healthy cluster:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for the healthy cluster:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster2 as failed in the healthy cnDBTier cluster cluster2. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
Run the following commands for restoring the cnDBTier cluster depending on the status of cnDBTier cluster:
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.

Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier Cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
  For example:
  - If cnDBTier Cluster1 needs to be restored, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the remote cnDBTier Cluster2 by configuring the remote clusters replication service IP address of cnDBTier cluster2 in cnDBTier cluster1.
  - If cnDBTier Cluster2 needs to be restored, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the remote cnDBTier Cluster1 by configuring the remote clusters replication service IP address of cnDBTier cluster1 in cnDBTier cluster2.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If the cnDBTier cluster that needs to be restored is UP, does not have any fatal errors, and if the georeplication has failed or a large replication delay exists, then run the following commands to restore the database from remote cnDBTier Cluster:
  1. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Run the following command to start the GR procedure on the failed cluster for restoring the cluster from remote cnDBTier Cluster:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of the failed cluster>/start
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
    For example,
    - If cnDBTier cluster1 needs to be restored, then run the following commands to start the georeplication procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP for cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the georeplication procedure on the failed cluster(cluster1) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
    - If cnDBTier cluster2 needs to be restored, then run the following commands to start the georeplication procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP for cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the georeplication procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
Wait until the georeplication recovery is complete. You can check the cluster status using Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

7.4.1.2 Three-Cluster Georeplication Failure

This section provides the procedures to recover georeplication failure in three-cluster georeplication deployments.

7.4.1.2.1 Resolving GR Failure Between cnDBTier Clusters in a Three-Cluster Replication

This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a three-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

The failed cnDBTier cluster is having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which require to be reinstalled.
cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster which needs to be restored.
Other two cnDBTier clusters (that is, first working cnDBTier cluster and second working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
NF or application traffic is diverted from the failed cnDBTier Cluster to any of the working cnDBTier Cluster.
CURL is installed in the environment from where commands are run.

Note:

To resolve this georeplication failure in one cluster, restore cnDBTier failed cluster to the latest DB backup using automated backup and restore, and then reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.

Procedure:

Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Run the following commands to mark gr_state of the failed cnDBTier cluster as FAILED in cnDBTier clusters that are accessible and in one of the healthy cnDBTier cluster:
1. Run the following command to get the replication service LoadBalancer IP for the cluster that must be marked as failed:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
2. Run the following command to get the replication service LoadBalancer Port for the cluster that must be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
3. Mark the cnDBTier cluster as failed in the failed cnDBTier cluster if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
```
4. Mark the cnDBTier cluster as failed in one of the other healthy remote cnDBTier cluster:
  1. Get the replication service LoadBalancer IP of the healthy cluster.
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer port of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
  3. Mark the cnDBTier cluster as failed in one of the healthy cnDBTier clusters. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
For example:
- If cnDBTier cluster1 is failed then perform the following:
  1. mark the cnDBTier cluster as failed in cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark the cnDBTier cluster cluster1 as failed if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
  2. Mark the unhealthy cnDBTier cluster as failed in one of the healthy cnDBTier cluster:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster2.
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- If cnDBTier cluster2 is failed, then perform the following:
  1. mark the cnDBTier cluster as failed in cnDBTier cluster cluster2:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark the cnDBTier cluster cluster2 as failed if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
  2. Mark the unhealthy cnDBTier cluster as failed in one of the healthy cnDBTier cluster:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster1.
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- If cnDBTier cluster3 is failed, then perform the following:
  1. mark the cnDBTier cluster as failed in unhealthy cnDBTier cluster if it is accessible:
    1. Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark the cnDBTier cluster cluster3 as failed in the unhealthy cnDBTier cluster if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
  2. Mark the unhealthy cnDBTier cluster3 as failed in one of the healthy cnDBTier cluster:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster1.
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
Run the following commands for restoring the cnDBTier cluster depending on the status of cnDBTier cluster.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.

Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier Cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
  For example:
  - If cnDBTier Cluster 1 needs to be restored, then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
  - If cnDBTier Cluster 2 needs to be restored, then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
  - If cnDBTier Cluster 3 needs to be restored, then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If cnDBTier cluster that needs to be restored is UP, and does not have any fatal errors, then run the following commands to restore the database from remote cnDBTier Clusters:
  1. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Run the following command to start the GR procedure on the failed cluster for restoring the cluster from remote cnDBTier Cluster:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
```
    or
    Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
    For example,
    - If cnDBTier cluster1 needs to be restored, then run the following commands to start the georeplication procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP for cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the georeplication procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
    - If cnDBTier cluster2 needs to be restored, then run the following commands to start the georeplication procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP for cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the georeplication procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
    - If cnDBTier cluster3 needs to be restored, then run the following commands to start the georeplication procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP for cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the georeplication procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
Wait until the georeplication recovery is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
You can verify and monitor the cluster status by performing the following steps:
1. Perform the Verifying Cluster Status Using cnDBTier procedure to check the cluster status.
2. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

7.4.1.2.2 Restoring Two cnDBTier Clusters in a Three-Cluster Replication

This section describes the procedure to restore two cnDBTier clusters having fatal errors in a three-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

First failed cnDBTier Cluster and second failed cnDBTier cluster have some fatal errors or georeplication between them is failed and which require to be restored.
working cnDBTier Cluster is in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state.
Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored.
NF or application traffic is diverted from the first failed cnDBTier cluster and second failed cnDBTier Cluster to the working cnDBTier Cluster.
CURL is installed in the environment from where commands are run.

Note:

To resolve this georeplication failure in two cnDBTier clusters (that is, first failed cnDBTier cluster and second failed cnDBTier cluster), restore failed cnDBTier Clusters to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2 and cnDBTier cluster3.

Procedure:

Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Run the following commands to mark the gr_state of failed cnDBTier clusters (that is, first failed cnDBTier cluster and second failed cnDBTier cluster) as FAILED in cnDBTier clusters in all unhealthy cnDBTier clusters and in one of the healthy cnDBTier cluster:
1. Run the following command to get the replication service LoadBalancer IP from one of the healthy cluster:
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
2. Run the following command to get the replication service LoadBalancer Port from one of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
3. Mark the unhealthy cnDBTier cluster as failed in one of the healthy cnDBTier clusters. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
```
4. If failed cnDBTier clusters are accessible, mark the unhealthy cnDBTier clusters as failed:
  1. Get the replication service LoadBalancer IP for the cluster which must be marked as failed.
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
  2. Get the replication service LoadBalancer port for the cluster which must be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Mark the cnDBTier cluster as failed in cnDBTier clusters if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed
        cluster>/failed
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
For example:
- If the failed cnDBTier clusters are cluster2 and cluster3, then perform the following:
  1. Mark the failed cnDBTier clusters cluster2 and cluster3 as failed in the healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If the failed cnDBTier clusters cluster2 and cluster3 are accessible, then perform the following:
  1. Mark the cnDBTier cluster as failed in cnDBTier cluster2:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster2 is failed, mark the cnDBTier cluster2 as failed in cnDBTier clusters if it is accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
  2. Mark the cnDBTier cluster as failed in cnDBTier cluster3:
    1. Get the replication service LoadBalancer IP for cluster3.
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster3 is the failed cluster, then mark the cnDBTier cluster3 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- If failed cnDBTier clusters are cluster1 and cluster3, then perform the following:
  1. Mark the failed cnDBTier clusters cluster1 and cluster3 as failed in the healthy cnDBTier cluster cluster2:
    1. Get the replication service LoadBalancer IP for cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If cnDBTier cluster1 and cluster3 has replication delay, perform the following:
  1. Mark the cnDBTier cluster as failed in cnDBTier cluster1:
    1. Get the replication service LoadBalancer IP for cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
  2. Mark the cnDBTier cluster as failed in cnDBTier cluster3:
    1. Get the replication service LoadBalancer IP for cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster3 is failed, mark the cnDBTier cluster3 as failed in unhealthy cnDBTier cluster cluster3 if it is accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- If failed cnDBTier clusters are cluster1 and cluster2, then perform the following:
  1. Mark the failed cnDBTier clusters cluster1 and cluster2 as failed in the healthy cnDBTier cluster cluster3:
    1. Get the replication service LoadBalancer IP for cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- If cnDBTier cluster1 and cluster2 has replication delay, perform the following:
  1. Mark the cnDBTier cluster as failed in cnDBTier cluster1:
    1. Get the replication service LoadBalancer IP for cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
  2. Mark the cnDBTier cluster as failed in cnDBTier cluster2:
    1. Get the replication service LoadBalancer IP for cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster2 is failed, mark the cnDBTier cluster2 as failed in unhealthy cnDBTier cluster cluster2 if it is accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
Uninstall first failed cnDBTier cluster if first failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Run the following commands to reinstall first failed cnDBTier cluster and wait till database is restored from the remote cluster is completed.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.

Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If the cnDBTier cluster that needs restoration contains fatal errors, or if the cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from the remote cnDBTier cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
  For example:
  - If the first failed cnDBTier Cluster is cnDBTier cluster 1, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
  - If the first failed cnDBTier Cluster is cnDBTier cluster 2, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
  - If the first failed cnDBTier Cluster is cnDBTier cluster 3, then uninstall cnDBtier cluster3 and reinstall cnDBTier cluster3 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If cnDBTier cluster which needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Run the following commands to restore the database from remote cnDBTier Clusters:
    1. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
    2. Get the replication service LoadBalancer PORT of the cluster where georeplication needs to be restored:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
    3. Run the following command to start the GR procedure on the failed cluster for restoring the cluster from remote cnDBTier Cluster:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
```
      or
      Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
    For example:
    - If cnDBTier cluster1 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
    - If cnDBTier cluster2 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
    - If cnDBTier cluster3 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
Wait until database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is completed.
Check the cluster status using Verifying Cluster Status Using cnDBTier procedure after the cnDBTier cluster is UP and continue monitoring the georeplication recovery status.

Follow Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure to monitor the georeplication recovery status.
Run the following commands to reinstall second failed cnDBTier cluster and wait till database is restored from the remote cluster is completed.
Follow the step a, if the cnDBTier cluster has fatal errors (like PVC corruption, PVC not accessible, other fatal errors, and so on.).

Follow the step b and c, if the cnDBTier cluster does not have any fatal errors and restore of database is needed because of the georeplication failure.
1. If first failed cnDBTier cluster which needs to be restored has fatal errors or if first failed cnDBTier cluster status is DOWN then reinstall the first failed cnDBTier cluster to restore the database from remote cnDBTier Clusters.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
  For example:
  - If second failed cnDBTier Cluster is cnDBTier cluster 1, then uninstall cnDBtier cluster1 and install cnDBTier cluster1 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
  - If second failed cnDBTier Cluster is cnDBTier cluster 2, then uninstall cnDBtier cluster2 and install cnDBTier cluster2 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
  - If second failed cnDBTier Cluster is cnDBTier cluster 3, then uninstall cnDBtier cluster3 and install cnDBTier cluster3 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If the second failed cnDBTier cluster that needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Run the following commands to restore the database from remote cnDBTier Clusters:
    1. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
    2. Get the replication service LoadBalancer PORT of the cluster where georeplication needs to be restored:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
    3. Run the following command to start the GR procedure on the failed cluster for restoring the cluster from remote cnDBTier Cluster:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
```
      or
      Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
    For example:
    - If cnDBTier cluster1 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
    - If cnDBTier cluster2 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure for restoring the cluster on the failed cluster (cluster2) from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
    - If cnDBTier cluster3 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
Wait until the georeplication recovery is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
You can verify and monitor the cluster status by performing the following steps:
1. Perform the Verifying Cluster Status Using cnDBTier procedure to check the cluster status.
2. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

7.4.1.3 Four-Cluster Georeplication Failure

This section provides the procedures to recover georeplication failure in four-cluster georeplication deployments.

7.4.1.3.1 Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication

This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a four-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

The failed cnDBTier cluster is having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which require to be reinstalled.
cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster that must be restored.
Other three cnDBTier clusters (that is, first working cnDBTier cluster, second working cnDBTier cluster and third working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
NF or application traffic is diverted from Failed cnDBTier Cluster to any of the working cnDBTier Cluster and third working cnDBTier cluster.
CURL is installed in the environment from where commands are run.

Note:

To resolve this georeplication failure in a single failed cnDBTier cluster, restore the failed cnDBTier Cluster to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.

Procedure:

Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Run the following commands to mark the gr_state of failed cnDBTier cluster as FAILED in any working cnDBTier cluster and also failed cnDBTier cluster, if it is accessible:
1. Run the following command to get the replication service LoadBalancer IP for the cluster that must be marked as failed:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
2. Run the following command to get the replication service LoadBalancer Port for the cluster that must be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
3. Mark the cnDBTier cluster as failed in the failed cnDBTier cluster if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
```
4. Mark the unhealthy cnDBTier cluster as failed in one of the healthy clusters:
  1. Get the replication service LoadBalancer IP for the healthy cluster:
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer port of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
  3. Mark the cnDBTier cluster as failed in one of the healthy cnDBTier clusters. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
```
    Note:
    For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
For example:
- If cnDBTier cluster1 is the failed cluster, then perform the following:
  1. mark the cnDBTier cluster1 as failed in cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark the cnDBTier cluster cluster1 as failed. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
  2. Mark the unhealthy cnDBTier cluster1 as failed in one of the healthy cluster cnDBTier cluster2:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster2.
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- If cnDBTier cluster2 is the failed cluster, then perform the following:
  1. mark the cnDBTier cluster as failed in cnDBTier cluster cluster2:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster2 is unhealthy, run the following command to mark the cnDBTier cluster cluster2 as failed if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
  2. Mark the unhealthy cnDBTier cluster as failed in one of the healthy cnDBTier clusters:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster1.
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster2 as failed in the healthy cnDBTier cluster cluster1:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- If cnDBTier cluster3 is the failed cluster, then perform the following:
  1. mark the cnDBTier cluster cluster3 as failed in the unhealthy cnDBTier clusters if it is accessible:
    1. Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster3 is unhealthy, run the following command to mark the cnDBTier cluster cluster3 as failed if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
  2. Mark the unhealthy cnDBTier cluster3 as failed in one of the healthy cnDBTier clusters:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster1.
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster3 as failed in healthy cnDBTier cluster cluster1:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If cnDBTier cluster4 is unhealthy, then perform the following:
  1. mark cluster4 as failed in the unhealthy cnDBTier cluster clusterr4 if it is accessible:
    1. Get the replication service LoadBalancer IP of cluste43:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. If cnDBTier cluster4 is unhealthy, run the following command to mark the cnDBTier cluster cluster4 as failed if it is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
  2. Mark the unhealthy cnDBTier cluster4 as failed in one of the healthy cnDBTier clusters:
    1. Get the loadbalancer IP of healthy cnDBTier cluster cluster1.
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port for cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Run the following command to mark cluster4 as failed in healthy cnDBTier cluster cluster1:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
Run the following commands for restoring the cnDBTier cluster depending on the status of cnDBTier cluster.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.

Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier Cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
  For example:
  - If cnDBTier Cluster 1 needs to be restored then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
  - If cnDBTier Cluster 2 needs to be restored then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
  - If cnDBTier Cluster 3 needs to be restored then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
  - If cnDBTier Cluster 4 needs to be restored then uninstall and install cnDBTier cluster4 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster4.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If cnDBTier cluster which needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Run the following commands to restore the database from remote cnDBTier Clusters:
    1. Get the replication service LoadBalancer IP of the cluster where georeplication needs to be restored:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
    2. Get the replication service LoadBalancer PORT of the cluster where georeplication needs to be restored:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
    3. Run the following command to start the GR procedure on the failed cluster for restoring the cluster from remote cnDBTier Cluster:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
```
      or
      Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
      For example:
      
      If cnDBTier cluster1 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster1) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
      Sample output:
      Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
      
      If cnDBTier cluster2 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster2) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
      Sample output:
      Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
      
      If cnDBTier cluster3 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster3) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster(cluster3) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
      Sample output:
      Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
      
      If cnDBTier cluster4 needs to be restored, run the following commands to start the GR procedure on the failed cluster (cluster4) for restoring the cluster from remote cnDBTier Cluster:
      
      Get the replication service LoadBalancer IP of cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Run the following command to start the GR procedure on the failed cluster (cluster4) for restoring the cluster from remote cnDBTier Cluster:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/start
      Sample output:
      Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
Wait until the georeplication recovery is completed. You can check the cluster status using Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

7.4.1.3.2 Restoring Two cnDBTier Clusters in a Four-Cluster Replication

This section describes the procedure to reinstall two cnDBTier clusters having fatal error in a four-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

The first failed cnDBTier cluster and the second failed cnDBTier cluster are having a replication delay impacting the NF functionality, with respect to other cnDBTier Clusters or has some fatal errors which require to be reinstalled.
cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster that needs to be restored.
Other two cnDBTier clusters (that is, first working cnDBTier cluster and second working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster will be restored initially and then second cnDBTier cluster will be restored.
NF or application traffic is diverted from Failed cnDBTier Cluster to any of the working cnDBTier Cluster.
CURL is installed in the environment from where commands are run.

Note:

To resolve this georeplication failure in a first failed cnDBTier cluster and second failed cnDBTier cluster, restore failed cnDBTiers to the latest DB backup using automated backup and restore. Reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3 and cnDBTier cluster4.

Procedure:

Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Run the following commands to mark the gr_state of failed cnDBTier cluster and second failed cnDBTier cluster as FAILED in any of the healthy cnDBTier cluster and in all unhealthy cnDBTier clusters that are accessible:
1. Run the following command to get the replication service LoadBalancer IP from one of the healthy cluster:
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
2. Run the following command to get the replication service LoadBalancer Port from one of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
3. Mark the unhealthy cnDBTier cluster as failed in one of the healthy cnDBTier clusters. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
```
4. Mark the unhealthy cnDBTier clusters as failed in the unhealthy cnDBTier cluster if it is accessible.:
  1. Get the replication service LoadBalancer IP for the cluster which must be marked as failed:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
```
  2. Get the replication service LoadBalancer port for the cluster which must be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Mark the cnDBTier cluster as failed in cnDBTier clusters that is accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed
        cluster>/failed
```
    Note:
    For more information about georeplication recovery API responses and error codes, see Fault Recovery APIs.
For example:
- If cnDBTier cluster1 and cluster2 have fatal errors or replication delay:
  1. Mark the failed cnDBTier clusters cluster1 and cluster2 as failed in healthy cnDBTier cluster cluster3:
    1. Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    5. If cnDBTier cluster1 and cluster2 have replication delay:
      
      Mark cluster1 as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark cluster2 as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP for cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster2 is the failed cluster, mark the cnDBTier cluster2 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
  2. Mark the failed cnDBTier clusters cluster1 and cluster3 as failed in healthy cnDBTier cluster cluster2:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    5. If cnDBTier cluster1 and cluster3 have replication delay:
      
      Mark cnDBTier cluster as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark cnDBTier cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP for cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster3 is the failed cluster, mark the cnDBTier cluster3 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
  3. Mark the failed cnDBTier clusters cluster1 and cluster4 as failed in healthy cnDBTier cluster cluster2:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    5. If cnDBTier cluster1 and cluster4 have replication delay:
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark the cnDBTier cluster as failed in cluster4:
      
      Get the replication service LoadBalancer IP for cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster4 is the failed cluster, mark the cnDBTier cluster4 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
  4. Mark the failed cnDBTier clusters cluster2 and cluster3 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    5. If cnDBTier cluster2 and cluster3 have replication delay:
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cluster2 is the failed cluster, mark the cnDBTier cluster2 as failed in the cnDBTier clusters that are accessible
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP for cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster3 is the failed cluster, mark the cnDBTier cluster3 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
  5. Mark the failed cnDBTier clusters cluster2 and cluster4 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    4. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    5. If cnDBTier cluster2 and cluster4 have replication delay:
      
      Mark the cluster as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster2 is the failed cluster, mark the cnDBTier cluster2 as failed in cluster2 if it is accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
      
      Mark the cluster as failed in cnDBTier cluster4:
      
      Get the replication service LoadBalancer IP for cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster4 is the failed cluster, mark the cnDBTier cluster4 as failed in cluster4 if it is accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
  6. Mark the failed cnDBTier clusters cluster3 and cluster4 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    4. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    5. If cnDBTier cluster3 and cluster4 have replication delay:
      
      Mark the cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster3 is the failed cluster, mark the cnDBTier cluster3 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
      
      Mark the cluster as failed in cnDBTier cluster4:
      
      Get the replication service LoadBalancer IP for cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port for cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      If cnDBTier cluster4 is the failed cluster, mark the cnDBTier cluster4 as failed in cnDBTier clusters that are accessible:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
Uninstall first failed cnDBTier cluster if first failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Restore the first failed cnDBTier cluster and wait until database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the first failed cnDBTier Cluster.
Wait until database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

You can check the cluster status first failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery Status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
Reinstall the second failed cnDBTier cluster and wait until database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the second failed cnDBTier Cluster.
Wait until database is restored in the second failed cnDBTier cluster and georeplication recovery procedure is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the cluster status second failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

7.4.1.3.3 Restoring Three cnDBTier Clusters in a Four-Cluster Replication

This section describes the procedure to reinstall three cnDBTier clusters having fatal error in a four-cluster replication using cnDBTier georeplication recovery APIs.

Assumptions:

The first failed, second failed, and third failed cnDBTier clusters are having the replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which requires to be reinstalled.
cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node and api node) are in Running state if their is only a replication delay and no fatal errors exist in the cnDBTier cluster that needs to be restored.
Other working cnDBTier clusters (that is, first working cnDBTier cluster) are in healthy state, that is, all database nodes (including management node, data node and api node) are in Running state, and the replication channels between them are running correctly.
Only one cnDBTier cluster is restored at any time. For example: first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored and finally third failed cnDBTier cluster is restored.
NF or application traffic is diverted from Failed cnDBTier Clusters (First Failed cnDBTier cluster, Second Failed cnDBTier cluster and Third Failed cnDBTier cluster) to working cnDBTier Cluster.
CURL is installed in the environment from where commands are run.

Note:

To resolve this Georeplication failure in the first, second and third failed cnDBTier clusters, restore failed cnDBTiers to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3 and cnDBTier cluster4.

Procedure

Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Run the following commands to mark the gr_state of the first failed, second failed, and third failed cnDBTier cluster as FAILED in working cnDBTier cluster in the unhealthy cnDBTier clusters if accessible:
1. If failed cnDBTier clusters needs to be reinstalled, then update gr_state of the failed cnDBTier clusters in one of the healthy cnDBTier cluster if unhealthy clusters are not accessible and have fatal errors:
  1. Get the replication service LoadBalancer IP of the healthy cluster:
```
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 ) 
```
  2. Get the replication service LoadBalancer Port of the healthy cluster:
```
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
  3. Mark the cnDBTier cluster as failed in cnDBTier clusters that are healthy. http response code 200 indicates that the cluster's gr_state is marked as failed successfully.
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
```
2. If failed cnDBTier clusters are not reinstalled, then update gr_state in the failed cnDBTier clusters if they are accessible and does not have any fatal errors:
  1. Get the replication service LoadBalancer IP for the cluster which must be marked as failed:
```
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )  
```
  2. Get the replication service LoadBalancer Port for the cluster which should be marked as failed:
```
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
  3. Mark the cnDBTier cluster as failed in cnDBTier clusters which ever are accessible. http response code 200 indicates that the cluster's gr_state is marked as failed successfully:
```
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
```
For example:
- If cnDBTier cluster2, cluster3 and cluster4 have fatal errors or replication delay:
  1. mark the failed cnDBTier clusters cluster2, cluster3 and cluster4 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    5. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    6. If cnDBTier cluster2, cluster3, and cluster4 have replication delay:
      
      mark the cnDBTier cluster as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster4:
      
      Get the replication service LoadBalancer IP of cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- If cnDBTier cluster1, cluster3 and cluster4 have fatal errors or replication delay:
  1. mark the failed cnDBTier clusters cluster1, cluster3 and cluster4 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    5. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    6. If cnDBTier cluster1, cluster3, and cluster4 have replication delay:
      
      mark the cnDBTier cluster as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster4:
      
      Get the replication service LoadBalancer IP of cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- If cnDBTier cluster1, cluster2 and cluster4 have fatal errors or replication delay:
  1. mark the failed cnDBTier clusters cluster1, cluster2 and cluster4 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    5. Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
    6. If cnDBTier cluster1, cluster2, and cluster4 have replication delay:
      
      mark the cnDBTier cluster as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster4:
      
      Get the replication service LoadBalancer IP of cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster4 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- If cnDBTier cluster1, cluster2 and cluster3 have fatal errors or replication delay:
  1. mark the failed cnDBTier clusters cluster1, cluster2 and cluster3 as failed in healthy cnDBTier cluster cluster1:
    1. Get the replication service LoadBalancer IP of cluster4:
      $ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
    2. Get the replication service LoadBalancer Port of cluster4:
      $ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
    3. Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
    4. Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
    5. Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
    6. If cnDBTier cluster1, cluster2, and cluster3 have replication delay:
      
      mark the cnDBTier cluster as failed in cnDBTier cluster1:
      
      Get the replication service LoadBalancer IP of cluster1:
      $ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster1:
      $ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster1 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster2:
      
      Get the replication service LoadBalancer IP of cluster2:
      $ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster2:
      $ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster2 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
      
      Mark the cnDBTier cluster as failed in cnDBTier cluster3:
      
      Get the replication service LoadBalancer IP of cluster3:
      $ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
      
      Get the replication service LoadBalancer Port of cluster3:
      $ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
      
      Mark the cnDBTier cluster3 as failed:
      $ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
Note:
For more information about georeplication recovery API responses and error codes, see Fault Recovery APIs.
Uninstall first failed cnDBTier cluster if first failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Uninstall third failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Restore the first failed cnDBTier cluster and wait until the database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the first failed cnDBTier Cluster.
Wait until the database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the cluster status first failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery Status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
Reinstall the second failed cnDBTier cluster and wait until the database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the second failed cnDBTier Cluster.
Wait until the database is restored in the second failed cnDBTier cluster and georeplication recovery procedure is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

You can check the cluster status of the second failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
Reinstall the third failed cnDBTier cluster and wait until the database is restored from the remote cluster is completed. Follow Step 3 of Georeplication Failure between cnDBTier Clusters in Four Site Replication for restoring the third failed cnDBTier Cluster.
Wait until the database is restored in the third failed cnDBTier cluster and georeplication recovery procedure is completed.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the status of third failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.

7.4.1.4 All Cluster Georeplication Failure

This section describes the procedure to resolve georeplication failures on all clusters.

Procedure:

Uninstall all the failed cnDBTier clusters one after the other.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the failed cnDBTier clusters.
Reinstall the standalone cnDBTier cluster. For installation procedure, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
Restore the DB in the standalone cnDBTier cluster from the backup taken from the other cluster. For procedure, see Restoring Database from Backup with ndb_restore.
Add mate clusters one cluster after the other. For cluster addition procedures, see the "Adding Georedundant cnDBTier Site" section in Oracle Communications Cloud Native Core, cnDBTier User Guide.

7.4.2 Restoring Georeplication Failures Using CNC Console

This chapter describes the procedures to restore Georeplication (GR) failures using CNC Console.

7.4.2.1 Two-Cluster Georeplication Failure

This section provides the recovery scenarios in two-cluster georeplication deployments.

7.4.2.1.1 Resolving GR Failure Between cnDBTier Clusters in Two-Cluster Replication using CNC Console

This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a two-cluster replication using CNC Console.

Assumptions:

The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to other cnDBTier cluster or has some fatal errors for which the cluster needs to be reinstalled.
If there is only a replication delay and there are no fatal errors in the cnDBTier cluster that needs to be restored, then all the cnDBTier clusters are in healthy state, that is, all database nodes (including management node, data node, and api node) are in the Running state.
NF or application traffic is diverted from the failed cnDBTier cluster to the working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

To resolve this georeplication failure, restore cnDBTier cluster to the latest DB backup using automated backup and restore and then re-establish the replication channels.

Check the status of both cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of cnDBTier cluster1 and cnDBTier cluster2.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The drop-down displays the clusters that are part of the replication setup (cluster1 and cluster2). When you select a cluster and click Update Cluster, the selected cluster name is updated in the Failed Cluster Names field.
  
  For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
  The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:
  
  Figure 7-1 Update Cluster As Failed
Perform this step to restore the cnDBTier cluster depending on the status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier cluster.
  Follow Reinstalling cnDBTier Cluster to install the cnDBTier cluster by configuring the remote cluster IP address of the replication service of the remote cnDBTier cluster for restoring the database.
  For example:
  - If cnDBTier cluster1 needs to be restored, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the remote cnDBTier cluster2 by configuring the remote cluster replication service IP address of cnDBTier cluster2 in cnDBTier cluster1.
  - If cnDBTier cluster2 needs to be restored, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the remote cnDBTier cluster1 by configuring the remote cluster replication service IP address of cnDBTier cluster1 in cnDBTier cluster2.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If the cnDBTier cluster does not have any fatal errors, then perform the following steps to start the georeplication recovery in the cnDBTier cluster:
  1. On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
  2. Select Start Georeplication Recovery. The following image shows a sample Start Georeplication Recovery page:
    
    Figure 7-2 Start Georeplication Recovery
  3. From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
  4. [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
    For example, if cluster1 failed and needs to be restored, then the system displays the other healthy cluster (cluster2) to pick the backup. If no option is selected, the system uses the available healthy cluster by default.
  5. Click Start Georeplication Recovery.
Wait until the georeplication recovery is complete. Use the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the status of the cluster. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

7.4.2.2 Three-Cluster Georeplication Failure

This section describes the recovery scenarios in three-cluster georeplication deployments.

7.4.2.2.1 Resolving GR Failure Between cnDBTier Clusters in Three-Cluster Replication using CNC Console

This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a three-cluster replication.

Assumptions:

The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to the other cnDBTier clusters or has have some fatal errors for which the cluster needs to be reinstalled.
If there is only a replication delay and there are no fatal errors in the cnDBTier cluster that needs to be restored, then all the cnDBTier clusters are in healthy state, that is, all database nodes (including management node, data node, and api node) are in the Running state.
Apart from the failed cluster, the other two cnDBTier clusters (that is, the first working cnDBTier cluster and second working cnDBTier cluster) are in healthy state. That is, all database nodes (including management node, data node, and api node) are in the Running state, and the replication channels between them run correctly.
NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

Procedure:

Check the status of all the three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The drop-down displays the clusters that are part of the replication setup (cluster1, cluster2, and cluster3). When you select a cluster and click Update Cluster, the selected cluster name is updated in the Failed Cluster Names field.
  
  For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
  
  The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:
  
  Figure 7-3 Update Cluster As Failed
Perform this step to restore the cnDBTier cluster depending on the status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
  For example:
  - If cnDBTier cluster1 needs to be restored, then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
  - If cnDBTier cluster2 needs to be restored, then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
  - If cnDBTier cluster3 needs to be restored, then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If the cnDBTier cluster does not have any fatal errors, then perform the following steps to start the georeplication recovery in the cnDBTier cluster:
  1. On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
  2. Select Start Georeplication Recovery. The following image shows a sample Start Georeplication Recovery page:
    
    Figure 7-4 Start Georeplication Recovery
  3. From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
    For example, if you want to restore cluster1, then select cluster1.
  4. [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
    For example, if cluster1 failed and needs to be restored, then the system displays the other two healthy clusters (cluster2 and cluster3) to pick the backup. Select one cluster from the available healthy clusters to use the backup. Otherwise, the system uses the backup from the first available healthy cluster by default.
  5. Click Start Georeplication Recovery.
Wait until the georeplication recovery is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
You can verify and monitor the cluster status by performing the following steps:
1. Perform the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the cluster status.
2. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using CNC Console procedure.

7.4.2.2.2 Restoring Two cnDBTier Clusters in Three-Cluster Replication using CNC Console

This section describes the procedure to restore two cnDBTier clusters having fatal error in a three-cluster replication setup.

Assumptions:

The first failed cnDBTier cluster and second failed cnDBTier cluster have some fatal errors or georeplication between them failed and the cluster needs to be restored.
The working cnDBTier cluster is in healthy state. That is, all database nodes (including management node, data node, and api node) are in the Running state.
Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored.
NF or application traffic is diverted from the first failed cnDBTier cluster and second failed cnDBTier cluster to the working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

To resolve this georeplication failure in two cnDBTier clusters (that is, first failed cnDBTier cluster and second failed cnDBTier cluster), restore failed cnDBTier clusters to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2 and cnDBTier cluster3.

Procedure:

Check the status of the cnDBTier cluster status in all the three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, and cluster3). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
  For example, if you want to mark cluster1 and cluster2 as failed, then perform the following steps:
  1. Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
  2. Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays both cluster1 and cluster2 in the Failed Cluster Names field.
  You can use the similar steps to update the status in the other combinations of clusters as well. The following image shows a sample Update Cluster As Failed page where cluster1 and cluster2 are marked as failed clusters:
  
  Figure 7-5 Update Cluster1 and Cluster2 As Failed
Uninstall the first failed cnDBTier cluster if the first failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall the second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Perform this step to reinstall the first failed cnDBTier cluster depending on the status of the cnDBTier cluster and wait until the database restore from the remote cluster is complete.
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If the cnDBTier cluster that needs restoration contains fatal errors, or if the cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from the remote cnDBTier cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
  For example:
  - If the first failed cnDBTier cluster is cnDBTier cluster1, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
  - If the first failed cnDBTier cluster is cnDBTier cluster2, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
  - If the first failed cnDBTier cluster is cnDBTier cluster3, then uninstall cnDBtier cluster3 and reinstall cnDBTier cluster3 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If cnDBTier cluster which needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote cluster IPs(Step 2a) and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Perform the following steps to start the georeplication recovery in the cnDBTier cluster:
    1. On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
    2. Select Start Georeplication Recovery. The following image shows a sample Start Georeplication Recovery page:
      
      Figure 7-6 Start Georeplication Recovery
    3. From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
      For example, if cluster1 is the first failed cluster that needs to be restored, then select cluster1.
    4. [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
    5. Click Start Georeplication Recovery.
Wait until database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is completed.
Check the cluster status using Verifying cnDBTier Cluster Status Using CNC Console procedure after the cnDBTier cluster is UP and continue monitoring the georeplication recovery status.

Follow Monitoring Georeplication Recovery Status Using CNC Console procedure to monitor the georeplication recovery status.
Perform this step to reinstall the second failed cnDBTier cluster depending on the status of the cnDBTier cluster and wait until the database restore from the remote cluster is complete.
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If the second failed cnDBTier cluster which needs to be restored has fatal errors or if first failed cnDBTier cluster status is DOWN then reinstall the first failed cnDBTier cluster to restore the database from remote cnDBTier clusters.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
  For example:
  - If second failed cnDBTier cluster is cnDBTier cluster1, then uninstall cnDBtier cluster1 and install cnDBTier cluster1 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
  - If second failed cnDBTier cluster is cnDBTier cluster2, then uninstall cnDBtier cluster2 and install cnDBTier cluster2 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
  - If second failed cnDBTier cluster is cnDBTier cluster3, then uninstall cnDBtier cluster3 and install cnDBTier cluster3 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If the second failed cnDBTier cluster that needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote cluster IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Perform the following steps to start the georeplication recovery in the cnDBTier cluster:
    1. On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
    2. Select Start Georeplication Recovery. The following image shows a sample Start Georeplication Recovery page:
      
      Figure 7-7 Start Georeplication Recovery
    3. From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
      For example, if cluster2 is the second failed cluster that needs to be restored, then select cluster2.
    4. [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
    5. Click Start Georeplication Recovery.
Wait until the georeplication recovery is complete. Verify and monitor the cluster status by performing the following steps:
1. Perform the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the cluster status.
2. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

7.4.2.3 Four-Cluster Georeplication Failure

This section describes the recovery scenarios in four-cluster georeplication deployments.

7.4.2.3.1 Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console

This section describes the procedure to resolve georeplication failure between cnDBTier clusters in four-cluster replication.

Assumptions:

The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to the other cnDBTier clusters or has fatal errors for which the cluster needs to be reinstalled.
If there is only a replication delay and there are no fatal errors in the cnDBTier cluster that needs to be restored, then all the cnDBTier clusters are in healthy state, that is, all database nodes (including management node, data node, and API node) are in the Running state.
Apart from the failed cluster, the other three cnDBTier clusters (that is, first working cnDBTier cluster, second working cnDBTier cluster, and third working cnDBTier cluster) are in a healthy state. That is, all database nodes (including management node, data node, and API node) are in the Running state, and the replication channels between them are running correctly.
NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster and third working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

To resolve this georeplication failure in a single failed cnDBTier cluster, restore the failed cnDBTier cluster to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.

Procedure:

Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The drop-down displays the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). The selected cluster name is updated in the Failed Cluster Names field. For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
  
  The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:
  
  Figure 7-8 Update Cluster As Failed
Perform this step to restore the cnDBTier cluster depending on the status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
1. If cnDBTier cluster which needs to be restored has fatal errors, or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster to restore the database from remote cnDBTier cluster.
  Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
  For example:
  - If cnDBTier cluster1 needs to be restored then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
  - If cnDBTier cluster 2 needs to be restored then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
  - If cnDBTier cluster 3 needs to be restored then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
  - If cnDBTier cluster 4 needs to be restored then uninstall and install cnDBTier cluster4 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster4.
2. Create the required NF-specific user accounts and grants to match NF users and grants of the good cluster in the reinstalled cnDBTier. For sample procedure, see Creating NF Users.
  
  Note:
  Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
3. If cnDBTier cluster which needs to be restored is UP, and does not have any fatal errors, then perform the following:
  1. Configure the remote cluster IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
  2. Perform the following steps to start the georeplication recovery in the cnDBTier cluster:
    1. On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
    2. Select Start Georeplication Recovery. The following image shows a sample Start Georeplication Recovery page:
      
      Figure 7-9 Start Georeplication Recovery
    3. From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
      For example, if cluster1 is the failed cluster that needs to be restored, then select cluster1.
    4. [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
    5. Click Start Georeplication Recovery.
Wait until the georeplication recovery is completed. You can check the cluster status using Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

7.4.2.3.2 Restoring Two cnDBTier Clusters in Four-Cluster Replication using CNC Console

This section describes the procedure to reinstall two cnDBTier clusters having fatal error in a four-cluster replication.

Assumptions:

The first failed and second failed cnDBTier clusters are having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has fatal errors for which the cluster needs to be reinstalled.
If there is only a replication delay and there are no fatal errors in the cnDBTier cluster that needs to be restored, then all the cnDBTier clusters are in healthy state, that is, all database nodes (including management node, data node, and API node) are in the Running state.
Apart from the failed clusters, the other two cnDBTier clusters (that is, first working cnDBTier cluster and second working cnDBTier cluster) are in a healthy state. That is, all database nodes (including management node, data node, and API node) are in the Running state, and the replication channels between them are running correctly.
Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then the second cnDBTier cluster is restored.
NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

Procedure:

Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
  For example, if you want to mark cluster1 and cluster2 as failed, then perform the following steps:
  1. Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
  2. Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays both cluster1 and cluster2 in the Failed Cluster Names field.
  You can use the similar steps to update the status in the other combinations of clusters as well. The following image shows a sample Update Cluster As Failed page where cluster1 and cluster2 are marked as failed clusters:
  
  Figure 7-10 Update Cluster1 and Cluster2 As Failed
Uninstall the first failed cnDBTier cluster if the first failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall the second failed cnDBTier cluster if the second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Restore the first failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the first failed cnDBTier cluster.
Wait until database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the cluster status of the first failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery S\status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Reinstall the second failed cnDBTier cluster and wait until database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the second failed cnDBTier cluster.
Wait until database is restored in the second failed cnDBTier cluster and georeplication recovery procedure is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the cluster status of the second failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.

7.4.2.3.3 Restoring Three cnDBTier Clusters in Four-Cluster Replication using CNC Console

This section describes the procedure to reinstall three cnDBTier clusters having fatal error in a four-cluster replication.

Assumptions:

The first, second, and third failed cnDBTier clusters have replication delay impacting the NF functionality, with respect to other cnDBTier clusters or have fatal errors for which the clusters need to be reinstalled.
If there is only replication delay and there are no fatal errors in the cnDBTier clusters that need to be restored, then all cnDBTier clusters are in healthy state. That is, all database nodes (including management node, data node and API node) are in the Running state.
Apart from the failed clusters, the other working cnDBTier clusters (that is, first working cnDBTier cluster) is in healthy state. That is, all database nodes (including management node, data node and api node) are in Running state, and the replication channels between them are running correctly.
Only one cnDBTier cluster is restored at any time. For example, the first failed cnDBTier cluster is restored initially, then the second cnDBTier cluster is restored, and finally the third failed cnDBTier cluster is restored.
NF or application traffic is diverted from the failed cnDBTier clusters (First failed cnDBTier cluster, second sailed cnDBTier cluster, and third failed cnDBTier cluster) to the working cnDBTier cluster.
If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.

Note:

To resolve this Georeplication failure in the first, second and third failed cnDBTier clusters, restore the failed cnDBTiers to the latest DB backup using the automated backup and restore. Reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.

Procedure

Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Perform the following steps to update the cluster status as FAILED:
1. Log in to CNC Console GUI of the active cluster.
2. Expand cnDBTier under the NF menu and select Georeplication Recovery.
3. Select Update Cluster As Failed.
4. From the Cluster Names drop-down, select the cluster that you want to mark as failed and click Update Cluster.
  The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
  For example, if you want to mark cluster1, cluster2, and cluster 3 as failed, then perform the following steps:
  1. Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
  2. Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 and cluster2 in the Failed Cluster Names field.
  3. Select cluster3 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1, cluster2, and cluster3 in the Failed Cluster Names field.
  You can use the similar steps to update the status in the other combinations of clusters as well. The following image shows a sample Update Cluster As Failed page where cluster1, cluster2, and cluster3 are marked as failed clusters:
  
  Figure 7-11 Update Cluster1, Cluster2, and Cluster3 As Failed
Uninstall the first failed cnDBTier cluster if the first failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
Uninstall the second failed cnDBTier cluster if the second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Uninstall the third failed cnDBTier cluster if the second failed cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
Reinstall the first failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the first failed cnDBTier cluster.
Wait until the database is restored in the first failed cnDBTier cluster and georeplication recovery procedure is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the cluster status first failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Reinstall the second failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the second failed cnDBTier cluster.
Wait until the database is restored in the second failed cnDBTier cluster and georeplication recovery procedure is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

You can check the cluster status of the second failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Reinstall the third failed cnDBTier cluster and wait until the database restore from the remote cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the third failed cnDBTier cluster.
Wait until the database is restored in the third failed cnDBTier cluster and georeplication recovery procedure is complete.

Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.

Check the status of third failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.

7.4.3 Recovering Georeplication Sites Using dbtrecover

The dbtrecover script automates the georeplication recovery procedures and increases the probability of successfully completing the recovery process by simplifying the procedures, reducing human errors, and running a limited number of system tests and connectivity tests. This section provides information about recovering georeplication sites using the dbtrecover bash script.

Note:

The script can be run on all cnDBTier versions starting 22.1.x.

dbtrecover is an interactive script. It runs the georeplication recovery procedures on one site at a time. It collects the information about the system from the site the script is run from and promts you to provide details about the site that needs to be recovered. Before prompting, the script displays the system information. You must review the system information before providing the recovery site details and proceeding further.

dbtrecover synchronizes the database between sites and recovers georeplication sites by automatically running the georeplication recovery procedures. For more information about georeplication recovery procedures, see Restoring Georeplication (GR) Failure.

In addition, the script tests the cnDBTier and georeplication systems that are required for a successful georeplication recovery. These tests can be expanded as per your feedback in the future releases.

7.4.3.1 Prerequisites

Before using the dbtrecover script, ensure that the following prerequisites are met:

The dbtrecover script must be properly installed. For more information, see Installing and Using the dbtrecover Script.
The script must be run only from the good site. For information about selecting the good site, see Selecting the Good Site.
The deployment log level (LOG_LEVEL) of db-backup-manager must be set to INFO and not DEBUG.

Note:

You can recover only one site at a time using the dbtrecover script. If multiple sites need to be recovered, you must rerun the script to recover the sites one after the other.

cnDBTier Version Considerations

The dbtrecover script supports georeplication recovery between sites running different cnDBTier versions. However, the following conditions must be considered before performing a georeplication recovery using the script:

The script supports georeplication recovery between sites only in the following cases:
- The good site and bad site (the site to be recovered) runs the same cnDBTier version.
- The good site runs a lower cnDBTier version. However, the cnDBTier releases on both the good site and the bad site supports the same database replication REST API.
The script doesn't support georeplication recovery between sites in the following cases:
- The good site runs a cnDBTier version that is higher than the version in the bad site.
- The good site runs a lower cnDBTier version and uses a different database replication REST API from the one used by the bad site.

The following cnDBTier versions use the old database replication API:

cnDBTier versions below 22.4.2
cnDBTier version 23.1.0

The following cnDBTier versions use the new database replication API:

cnDBTier versions greater than or equal to 22.4.2 and less than 23.1.0
cnDBTier version greater than or equal 23.1.1

For more information about database replication REST APIs, see Oracle Communications Cloud Native Core, cnDBTier User Guide.

If you want to recover a bad site that runs lower cnDBTier version from a good site that runs a newer version, perform the following steps:

While running the dbtrecover script, select the Reinstall (that is, the bad site has a fatal error) option.
Install the new version on the bad site. When the new release is installed, cnDBTier completes the recovery automatically.

Note:

Ensure that you do not reinstall the same version (lower version) on the bad site. This causes the georeplication recovery to fail.

7.4.3.2 Selecting the Good Site

The dbtrecover script must be run only from the good site. The script uses the good site to take the database backup for performing a georeplication recovery on the failed site. The following steps detail the method in which the script selects the good site:

The script scans through the site IDs from 1 through 4 one by one and prompts you to state if that site requires a recovery. That is, the script first chooses site 1 and prompts you to state if site 1 requires a recovery.
If site 1 doesn't require a recovery, then site 1 is selected as the good site. That is, dbtrecover must be run on the namespace and Bastion of site 1.
If site 1 requires a recovery, then the system moves to site 2 and prompts you to state if site 2 requires a recovery.
This process continues until the good site is identified.

Note:

If dbtrecover is run on a wrong site, the script terminates with an error. It also displays the identified good site and directs you to run the script on the good site.

7.4.3.3 Installing and Using the dbtrecover Script

This section provides details about installing and using the dbtrecover script to perform an automated georeplication recovery.

Installing the dbtrecover Script

Run the following commands to source and add the bin directory containing the dbtrecover script to the user's path.

cd Artifacts/Scripts/tools
source ./source_me

The system prompts you to enter the namespace and uses the same to set the DBTIER_NAMESPACE. It also sets the DBTIER_LIB environment variable with the path to the directory containing the libraries needed by dbtrecover.

Using the dbtrecover Script

You can use the dbtrecover script in the following ways:

dbtrecover [-h | --help]
dbtrecover [OPTIONS]

7.4.3.4 Phases of dbtrecover

The dbtrecover script performs the georeplication recovery in phases. This section provides information about the different phases involved in dbtrecover.

Phase 0: Collecting Site Information
The script collects information about the current site on which it is running and all the other mate sites that the current "good" site knows about.
Phase 1: Running Sanity Checks
The script performs sanity checks to ensure that the georeplication recovery runs successfully.
Phase 2: Setting Up Georeplication Recovery
The script sets gr_state to FAILED and deletes mysql.ndb_apply_status records.
Phase 3: Starting Georeplication Recovery
The script starts the georeplication recovery for the first site that needs to be recovered. For example, if two sites (site 1 and site 2) are to be recovered, then the script starts the recovery for site 1. In this phase, the script sets gr_state to STARTDRRECOVER or prompts you to reinstall.
Phase 4: Starting Georeplication Recovery Monitoring
The script queries the replication_info.DBTIER_SITE_INFO table on the site that is being recovered every second and updates the status of recovery on gr_state.

Note:
When the dbtrecover script is run with the --start-from-monitoring option, the script starts the georecovery recovery from this phase.
Phase 5: Run Post-Processing Steps
The script restarts the mysqld pods if the version is less than 22.3.3.

7.4.3.5 Configuration Options

This section describes the list of options that are available to configure the dbtrecover script.

Table 7-2 Configuration Options

Option	Usage	Example
-h \| --help	This option is used to print the help message and exit.	`dbtrecover --help`
-v \| --version	This option is used to print script's version.	`dbtrecover --version`
--debug	This option is used to output DEBUG log message to standard error, stderr.	`dbtrecover --debug`
--no-colors	This option is used to print the output in the default terminal font color instead of using the dbtrecover colors.	`dbtrecover --no-colors`
--start-from-monitoring	This option is used to skip phase 1, phase 2 and phase 3 and start directly from phase 4 for the first site that is to be recovered. For information about the different phases involved in running dbtrecover, see Phases of dbtrecover. Note: The `dbtrecover` script must be run on the good site only. DBTIER_NAMESPACE must be set to the namespace of the good site while installing the script. For information about installing the script, see Installing and Using the dbtrecover Script.	`dbtrecover --start-from-monitoring`
--skip-repl-restart	This option is used to skip restarting the local and remote db-replication-svc main containers. By default, the script restarts these containers.	`dbtrecover --skip-repl-restart`
--use-ipv4	This option is used to direct the dbtrecover script to use IPv4.	`dbtrecover --use-ipv4`
--use-ipv6	This option is used to direct the dbtrecover script to use IPv6.	`dbtrecover --use-ipv6`
--skip-namespace-test	This option is used to skip testing that the namespace in DBTIER_NAMESPACE exists in the current cluster.	`dbtrecover --skip-namespace-test`
--skip-tests	This option is used to skip sanity tests.	`dbtrecover --skip-tests`
--tests-only	This option is used to run only the sanity tests.	`dbtrecover --tests-only`
--connect-timeout=<connect_timeout_in_seconds>	This option is used to specify the time (in seconds) to wait before ending a connection attempt. This option is used by the dbtrecover script when running the curl, mysql, mysqladmin and ssh commands. The default timeout value is 15 seconds.	`dbtrecover --connect-timeout=<connect_timeout_in_seconds>`

Note:

Use the

dbtrecover
          --help

command for more examples on running the script.

7.4.3.6 Running dbtrecover Script in a Two-Site Setup

This section provides the steps to run the dbtrecover script in a two-site setup.

Run the following commands to source the dbtrecover script file:

$ cd /home/youruser/workspace/Artifacts/Scripts/tools
$ source ./source_me

Note:

source_me must be sourced only when your current directory contains the source_me file.
Enter the namespace when the system prompts you. The system uses this namespace to set DBTIER_NAMESPACE. It also sets the DBTIER_LIB environment variable with the path to the directory containing the libraries needed by dbtrecover.

Sample output:

Enter cndbtier namespace: dbtier-namespace-2
DBTIER_NAMESPACE = "dbtier-namespace-2"
 
DBTIER_LIB=/home/youruser/workspace/Artifacts/Scripts/tools/lib
 
Adding /home/youruser/workspace/Artifacts/Scripts/tools/bin to PATH
PATH=/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.0.1.el7_9.x86_64/bin:/usr/dev_infra/platform/bin:/usr/dev_infra/generic/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/usr/local/ade/bin:/opt/gradle/gradle-6.8.3/bin:/opt/helm/helm-v3.5.2-linux-amd64/bin:/home/youruser/workspace/Artifacts/Scripts/tools/bin

Run the following command to run the dbtrecover script without any additional configurations:

Note:

You can pass additional parameters to the script to run the script as per your requirement. For information about the available configurations, see Configuration Options.

$ dbtrecover

The script runs in phases to identify the good site, take a backup from the good site, and recover the failed site.

Sample output to show a successful completion of the script:

dbtrecover 23.3.0.0.2
Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved.
2023-10-13T09:08:40Z INFO - ****************************************************************************************************
2023-10-13T09:08:40Z INFO - BEGIN PHASE 0: Collecting Site information
2023-10-13T09:08:40Z INFO - ****************************************************************************************************
2023-10-13T09:08:40Z INFO - dbtstart_ts = 1697188120
2023-10-13T09:08:41Z INFO - Using IPv4: LOOPBACK_IP="127.0.0.1"
2023-10-13T09:08:41Z INFO - DBTIER_NAMESPACE = dbtier_namespace-2
2023-10-13T09:08:41Z INFO - DBTIER_NAMESPACE = dbtier_namespace-2
2023-10-13T09:08:41Z INFO - Testing namespace, dbtier_namespace-2, exists...
2023-10-13T09:08:41Z INFO - Should be able to see namespace, dbtier_namespace-2, with "kubectl get ns -o name dbtier_namespace-2" - PASSED
2023-10-13T09:08:41Z INFO - Namespace, dbtier_namespace-2, exists
2023-10-13T09:08:41Z INFO - Getting sts and sts pod info...
2023-10-13T09:08:41Z INFO - Getting MGM sts and sts pod info...
2023-10-13T09:08:42Z INFO - MGM_STS="my-prefix-ndbmgmd"
2023-10-13T09:08:42Z INFO - MGM_REPLICAS="2"
2023-10-13T09:08:42Z INFO - MGM_PODS:
    my-prefix-ndbmgmd-0
    my-prefix-ndbmgmd-1
2023-10-13T09:08:42Z INFO - Getting NDB sts and sts pod info...
2023-10-13T09:08:42Z INFO - NDB_STS="my-prefix-ndbmtd"
2023-10-13T09:08:42Z INFO - NDB_REPLICAS="2"
2023-10-13T09:08:42Z INFO - NDB_PODS:
    my-prefix-ndbmtd-0
    my-prefix-ndbmtd-1
2023-10-13T09:08:42Z INFO - Getting API sts and sts pod info...
2023-10-13T09:08:42Z INFO - API_STS="my-prefix-ndbmysqld"
2023-10-13T09:08:42Z INFO - API_REPLICAS="2"
2023-10-13T09:08:42Z INFO - API_PODS:
    my-prefix-ndbmysqld-0
    my-prefix-ndbmysqld-1
2023-10-13T09:08:42Z INFO - Getting APP sts and sts pod info...
2023-10-13T09:08:42Z INFO - APP_STS="my-prefix-ndbappmysqld"
2023-10-13T09:08:42Z INFO - APP_REPLICAS="2"
2023-10-13T09:08:42Z INFO - APP_PODS:
    my-prefix-ndbappmysqld-0
    my-prefix-ndbappmysqld-1
2023-10-13T09:08:42Z INFO - Getting deployment pod info...
2023-10-13T09:08:42Z INFO - grepping for backup-man (BAK_CHART_NAME)...
2023-10-13T09:08:42Z INFO - BAK_PODS:
    my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9
2023-10-13T09:08:42Z INFO - BAK_DEPLOY:
    my-prefix-db-backup-manager-svc
2023-10-13T09:08:42Z INFO - grepping for db-mon (MON_CHART_NAME)...
2023-10-13T09:08:42Z INFO - MON_PODS:
    my-prefix-db-monitor-svc-7c54bdc95c-xpbk4
2023-10-13T09:08:42Z INFO - MON_DEPLOY:
    my-prefix-db-monitor-svc
2023-10-13T09:08:42Z INFO - grepping for repl (REP_CHART_NAME)...
2023-10-13T09:08:43Z INFO - REP_PODS:
    my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t
2023-10-13T09:08:43Z INFO - REP_DEPLOY:
    my-prefix-lfg-site-2-lfg-site-1-replication-svc
2023-10-13T09:08:43Z INFO - is https enabled: false
2023-10-13T09:08:43Z INFO - Collecting current site information...
2023-10-13T09:08:43Z INFO - Collecting current site name and id...
2023-10-13T09:08:43Z INFO - CURRENT_SITE_NAME = lfg-site-2
2023-10-13T09:08:43Z INFO - CURRENT_SITE_ID = 2
2023-10-13T09:08:43Z INFO - MATE_SITE_DB_REPLICATION_PORT=80
2023-10-13T09:08:43Z INFO - FILE_TRANSFER_PORT_NUMBER=2022
2023-10-13T09:08:44Z INFO - REPLCHANNEL_GROUP_COUNT=1
2023-10-13T09:08:44Z INFO - Current site information collected
2023-10-13T09:08:44Z INFO - Collecting clusterinformation...
NAME                                                              READY   STATUS    RESTARTS        AGE
my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9                  1/1     Running   6 (27m ago)     10h
my-prefix-db-monitor-svc-7c54bdc95c-xpbk4                         1/1     Running   0               10h
my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t   1/1     Running   9 (8m38s ago)   10h
my-prefix-ndbappmysqld-0                                          2/2     Running   3 (27m ago)     10h
my-prefix-ndbappmysqld-1                                          2/2     Running   3 (27m ago)     10h
my-prefix-ndbmgmd-0                                               1/1     Running   0               10h
my-prefix-ndbmgmd-1                                               1/1     Running   0               10h
my-prefix-ndbmtd-0                                                2/2     Running   0               10h
my-prefix-ndbmtd-1                                                2/2     Running   0               10h
my-prefix-ndbmysqld-0                                             2/2     Running   3 (27m ago)     10h
my-prefix-ndbmysqld-1                                             2/2     Running   3 (27m ago)     10h
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=1    @10.233.93.250  (mysql-8.0.31 ndb-8.0.31, Nodegroup: 0, *)
id=2    @10.233.121.144  (mysql-8.0.31 ndb-8.0.31, Nodegroup: 0)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.121.87  (mysql-8.0.31 ndb-8.0.31)
id=50   @10.233.118.121  (mysql-8.0.31 ndb-8.0.31)
 
[mysqld(API)]   8 node(s)
id=56   @10.233.118.205  (mysql-8.0.31 ndb-8.0.31)
id=57   @10.233.110.113  (mysql-8.0.31 ndb-8.0.31)
id=70   @10.233.115.118  (mysql-8.0.31 ndb-8.0.31)
id=71   @10.233.102.154  (mysql-8.0.31 ndb-8.0.31)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)
 
2023-10-13T09:08:44Z INFO - Number of sites: 2
2023-10-13T09:08:44Z INFO - SITE_NAME:
    Site 1: lfg-site-1
    Site 2: lfg-site-2
2023-10-13T09:08:47Z INFO - SITE_DB_REPL_SVC[1] = SITE_1_DB_REPL_SVC
2023-10-13T09:08:47Z INFO - SITE_1_DB_REPL_SVC:
10.121.27.176
2023-10-13T09:08:47Z INFO - SITE_DB_REPL_SVC[2] = SITE_2_DB_REPL_SVC
2023-10-13T09:08:47Z INFO - SITE_2_DB_REPL_SVC:
10.121.27.177
2023-10-13T09:08:48Z INFO - SERVER_ID = (1000 1001 2000 2001)
2023-10-13T09:08:48Z INFO - SERVER_IPS[1000] = 10.121.27.163
2023-10-13T09:08:48Z INFO - SERVER_IPS[1001] = 10.121.27.164
2023-10-13T09:08:48Z INFO - SERVER_IPS[2000] = 10.121.27.170
2023-10-13T09:08:48Z INFO - SERVER_IPS[2001] = 10.121.27.171
2023-10-13T09:08:49Z INFO - CURRENT_SITE_DBTIER_VERSION=22.4.2
2023-10-13T09:08:49Z INFO - CURRENT_SITE_DBTIER_VERSION_FROM_HELM=22.4.2
2023-10-13T09:08:49Z INFO - Cluster information collected
2023-10-13T09:08:49Z INFO - name = lfg-site-1, server_id = 1000, host = 10.121.27.163
2023-10-13T09:08:49Z INFO - name = lfg-site-1, server_id = 1001, host = 10.121.27.164
2023-10-13T09:08:49Z INFO - name = lfg-site-2, server_id = 2000, host = 10.121.27.170
2023-10-13T09:08:49Z INFO - name = lfg-site-2, server_id = 2001, host = 10.121.27.171
2023-10-13T09:08:49Z INFO - lfg-site-1 replication services IPs: 10.121.27.176
2023-10-13T09:08:49Z INFO - lfg-site-2 replication services IPs: 10.121.27.177
2023-10-13T09:08:49Z INFO - dbtcinfo_ts = 1697188129
2023-10-13T09:08:49Z INFO - Collecting cluster info took: 00 hr. 00 min. 09 sec.
Does lfg-site-1 need to be recovered (yes/no/exit)? yes
Does lfg-site-1 also need to be reinstalled (i.e. does it have fatal errors?) (yes/no/exit)? no
Does lfg-site-2 need to be recovered (yes/no/exit)? no
2023-10-13T09:08:57Z INFO - Recovering from lfg-site-2 (Site 2)
2023-10-13T09:08:57Z INFO - Recovering lfg-site-1 (Site 1)
DO YOU WANT TO PROCEED (yes/no/exit)? yes
ARE YOU SURE (yes/no/exit)? yes
2023-10-13T09:09:00Z INFO - Starting db-replication-svc restarts at 2023-10-13T09:09:00Z
      1 mysql        512 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar
2023-10-13T09:09:00Z INFO - Sending SIGTERM to pod: my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t, container: lfg-site-2-lfg-site-1-replication-svc (1697188140)
2023-10-13T09:09:00Z INFO - TERM signal sent to my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t
2023-10-13T09:09:00Z INFO - Waiting for my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t container to restart...
      1 mysql          0 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc
2023-10-13T09:09:48Z INFO - Waiting for condition: is_proc_1_running_in_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc
2023-10-13T09:09:48Z INFO - Condition occurred
2023-10-13T09:09:48Z INFO - Connected again after 48 seconds (1697188188 - 1697188140)
2023-10-13T09:09:48Z INFO - time process has been running: 0 secs
2023-10-13T09:09:48Z INFO - Restarted main container on my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t
2023-10-13T09:09:48Z INFO - Waiting for sshd to restart in my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t container...
     13 mysql          0 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc
2023-10-13T09:09:48Z INFO - Waiting for condition: is_sshd_running_in_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc
2023-10-13T09:09:48Z INFO - Condition occurred
2023-10-13T09:09:49Z INFO - Connected again after 49 seconds (1697188189 - 1697188140)
2023-10-13T09:09:49Z INFO - time process has been running: 1 secs
2023-10-13T09:09:49Z INFO - Restarted sshd in  main container on my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t
2023-10-13T09:09:49Z INFO - Killing main container in all remote REPL pods...
2023-10-13T09:09:49Z INFO - Restarting db-replication-svc containers in lfg-site-1
      1 mysql        535 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar
2023-10-13T09:09:51Z INFO - Sending SIGTERM (1697188140)
2023-10-13T09:09:55Z INFO - TERM signal sent to db-replication-svc on 10.121.27.176
2023-10-13T09:09:55Z INFO - Main container in all mate REPL pods killed
2023-10-13T09:09:55Z INFO - Waiting for mate db-replication-svc containers to restart...
2023-10-13T09:09:55Z INFO - Waiting for db-replication-svc containers in lfg-site-1
2023-10-13T09:09:55Z INFO - Waiting for 10.121.27.176 container to restart...
      1 mysql         64 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar
2023-10-13T09:11:46Z INFO - Waiting for condition: is_proc_1_running_in 10.121.27.176
2023-10-13T09:11:46Z INFO - Condition occurred
2023-10-13T09:11:50Z INFO - Connected again after 121 seconds (1697188310 - 1697188189)
2023-10-13T09:11:50Z INFO - time process has been running: 68
2023-10-13T09:11:50Z INFO - Restarted main container on 10.121.27.176
2023-10-13T09:11:50Z INFO - Main container in all mate(s) REPL pods restarted
2023-10-13T09:11:50Z INFO - Waiting for db-replication-svc containers in lfg-site-1 to be READY
2023-10-13T09:11:50Z INFO - Waiting for 10.121.27.176 to be READY...
2023-10-13T09:11:51Z INFO - 10.121.27.176 is READY
2023-10-13T09:11:51Z INFO - Waiting for sshd to come up in 10.121.27.176...
     13 mysql         71 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config 10.121.27.176
2023-10-13T09:11:53Z INFO - Waiting for condition: is_sshd_running_in 10.121.27.176
2023-10-13T09:11:53Z INFO - Condition occurred
2023-10-13T09:11:53Z INFO - sshd in 10.121.27.176 is up
2023-10-13T09:11:53Z INFO - Waiting for db-replication-svc containers in lfg-site-2 to be READY
2023-10-13T09:11:53Z INFO - Waiting for 10.121.27.177 to be READY...
2023-10-13T09:11:53Z INFO - 10.121.27.177 is READY
2023-10-13T09:11:53Z INFO - Waiting for sshd to come up in 10.121.27.177...
     13 mysql        129 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config 10.121.27.177
2023-10-13T09:11:57Z INFO - Waiting for condition: is_sshd_running_in 10.121.27.177
2023-10-13T09:11:57Z INFO - Condition occurred
2023-10-13T09:11:57Z INFO - sshd in 10.121.27.177 is up
2023-10-13T09:11:57Z INFO - Ended db-replication-svc restarts at 2023-10-13T09:11:57Z
2023-10-13T09:11:57Z INFO -     restarts took: 00 hr. 02 min. 57 sec.
2023-10-13T09:11:57Z INFO - ****************************************************************************************************
2023-10-13T09:11:57Z INFO - END PHASE 0: Collecting Site information
2023-10-13T09:11:57Z INFO - ****************************************************************************************************
2023-10-13T09:11:57Z INFO - dbtbtest_ts = 1697188317
2023-10-13T09:11:57Z INFO - ****************************************************************************************************
2023-10-13T09:11:57Z INFO - BEGIN PHASE 1: Run sanity checks
2023-10-13T09:11:57Z INFO - ****************************************************************************************************
2023-10-13T09:11:57Z INFO - Running sanity checks...
2023-10-13T09:11:57Z INFO - Testing remote site version...
2023-10-13T09:11:57Z INFO - Should be able to connect with HTTP from lfg-site-2 to lfg-site-1 (10.121.27.176:80) to get version (22.4.2) - PASSED
2023-10-13T09:11:57Z INFO - Version on lfg-site-1 (10.121.27.176:80) should be equal to version on lfg-site-2 (22.4.2, 22.4.2) - PASSED
2023-10-13T09:11:57Z INFO - Tests for remote site version finished
2023-10-13T09:11:57Z INFO - Testing database connectivity to remote NDB Cluster...
2023-10-13T09:11:58Z INFO - Should be able to connect from lfg-site-2 to lfg-site-1, 1000 (10.121.27.163) - PASSED
2023-10-13T09:11:59Z INFO - Should be able to connect from lfg-site-2 to lfg-site-1, 1001 (10.121.27.164) - PASSED
2023-10-13T09:11:59Z INFO - Tests for database connectivity to remote NDB Cluster finished
2023-10-13T09:11:59Z INFO - Testing HTTP connectivity to remote db-replication-svcs...
2023-10-13T09:11:59Z INFO - Should be able to connect with HTTP from lfg-site-2 to lfg-site-1 (10.121.27.176:80) to get status (UP) - PASSED
2023-10-13T09:11:59Z INFO - Tests for HTTP connectivity to remote db-replication-svcs finished
2023-10-13T09:11:59Z INFO - Testing SSH connectivity to remote db-replication-svcs...
2023-10-13T09:12:01Z INFO - Should be able to connect with SSH from lfg-site-2 to lfg-site-1 (10.121.27.176:2022) to get name (my-prefix-lfg-site-1-lfg-site-2-replication-svc-789d45d88bv6vmd) - PASSED
2023-10-13T09:12:01Z INFO - Tests for SSH connectivity to remote db-replication-svcs finished
2023-10-13T09:12:01Z INFO - Testing SSH connectivity to local db-replication-svcs from NDB pod...
2023-10-13T09:12:01Z INFO - Num of repl IPs in data pod's env variables should match those in DBTIER_REPL_SITE_INFO - PASSED
2023-10-13T09:12:04Z INFO - Should be able to connect with SSH from my-prefix-ndbmtd-0 to 10.121.27.177:2022 to get name (my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t) - PASSED
2023-10-13T09:12:04Z INFO - Tests for SSH connectivity to local db-replication-svcs from NDB pod finished
2023-10-13T09:12:04Z INFO - Testing local db-backup-svc...
2023-10-13T09:12:14Z INFO - Backup manager (my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9) should not be logging - PASSED
2023-10-13T09:12:24Z INFO - Backup executor (my-prefix-ndbmtd-0) should not be logging - PASSED
2023-10-13T09:12:24Z INFO - Backup executor (my-prefix-ndbmtd-1) should not be logging - PASSED
2023-10-13T09:12:24Z INFO - Tests for local db-backup-svc finished...
2023-10-13T09:12:24Z INFO - Sanity checks completed
2023-10-13T09:12:24Z INFO - ****************************************************************************************************
2023-10-13T09:12:24Z INFO - END PHASE 1: Run sanity checks
2023-10-13T09:12:24Z INFO - ****************************************************************************************************
2023-10-13T09:12:24Z INFO - dbtetest_ts = 1697188344
2023-10-13T09:12:24Z INFO - Sanity tests took: 00 hr. 00 min. 27 sec.
2023-10-13T09:12:24Z INFO - ****************************************************************************************************
2023-10-13T09:12:24Z INFO - BEGIN PHASE 2: Run GRR setup
2023-10-13T09:12:24Z INFO - ****************************************************************************************************
2023-10-13T09:12:24Z INFO - Setting STATE to FAILED in all sites...
2023-10-13T09:12:25Z INFO - lfg-site-1 (1000): STATE set to FAILED
site_name   site_id  dr_state   dr_backup_site_name  backup_id
lfg-site-1  1        FAILED     NULL                 NULL
lfg-site-2  2        COMPLETED  NULL                 NULL
2023-10-13T09:12:26Z INFO - lfg-site-2 (2000): STATE set to FAILED
site_name   site_id  dr_state   dr_backup_site_name  backup_id
lfg-site-1  1        FAILED     NULL                 NULL
lfg-site-2  2        COMPLETED  lfg-site-1           1013230837
2023-10-13T09:12:26Z INFO - STATE set to FAILED
2023-10-13T09:12:26Z INFO - Waiting 15 seconds for replicas to stop...
2023-10-13T09:12:41Z INFO - Waiting for condition: false
2023-10-13T09:12:41Z INFO - Waited for 15 seconds...
2023-10-13T09:12:41Z INFO - Delete from ndb_apply_status table in all sites...
2023-10-13T09:12:41Z INFO - lfg-site-1 (1000): deleted records
2023-10-13T09:12:42Z INFO -
 
2023-10-13T09:12:42Z INFO - lfg-site-2 (2000): deleted records
2023-10-13T09:12:43Z INFO -
 
2023-10-13T09:12:43Z INFO - Deleted from ndb_apply_status table in all sites
2023-10-13T09:12:43Z INFO - ****************************************************************************************************
2023-10-13T09:12:43Z INFO - END PHASE 2: Run GRR setup
2023-10-13T09:12:43Z INFO - ****************************************************************************************************
2023-10-13T09:12:43Z INFO - ****************************************************************************************************
2023-10-13T09:12:43Z INFO - BEGIN PHASE 3: Run GRR start
2023-10-13T09:12:43Z INFO - ****************************************************************************************************
2023-10-13T09:12:43Z INFO - Verifying replicas associated with lfg-site-1 have stopped
2023-10-13T09:12:44Z INFO - 1000: lfg-site-1, ndbmysqld-0 (56), 10.121.27.163: Replica has stopped
2023-10-13T09:12:45Z INFO - 1001: lfg-site-1, ndbmysqld-1 (57), 10.121.27.164: Replica has stopped
2023-10-13T09:12:46Z INFO - 2000: lfg-site-2, ndbmysqld-0 (56), 10.121.27.170: Replica has stopped
2023-10-13T09:12:47Z INFO - 2001: lfg-site-2, ndbmysqld-1 (57), 10.121.27.171: Replica has stopped
2023-10-13T09:12:47Z INFO - Replicas associated with lfg-site-1 have stopped (or been skipped if appropriate)
2023-10-13T09:12:47Z INFO - Recovering lfg-site-1...
2023-10-13T09:12:48Z INFO - GRR started for lfg-site-1 on 1000
2023-10-13T09:12:48Z INFO - Must use --start-from-monitoring if dbtrecover is interrupted
2023-10-13T09:12:48Z INFO - ANSWERS: dbtrecover --start-from-monitoring < <(dbtanswer "yes" "no" "no" "yes" "yes" )
2023-10-13T09:12:48Z INFO - IMPORTANT: Please, make sure the directory where dbtrecover and dbtanswer are is in PATH
2023-10-13T09:12:48Z INFO - ****************************************************************************************************
2023-10-13T09:12:48Z INFO - END PHASE 3: Run GRR start
2023-10-13T09:12:48Z INFO - ****************************************************************************************************
2023-10-13T09:12:48Z INFO - ****************************************************************************************************
2023-10-13T09:12:48Z INFO - BEGIN PHASE 4: Start monitoring GRR
2023-10-13T09:12:48Z INFO - ****************************************************************************************************
2023-10-13T09:12:48Z INFO - Monitoring status...
2023-10-13T09:12:48Z INFO - STATE = STARTDRRESTORE
2023-10-13T09:12:49Z INFO -
site_name   site_id  dr_state        dr_backup_site_name  backup_id
lfg-site-1  1        STARTDRRESTORE  NULL                 NULL
lfg-site-2  2        COMPLETED       NULL                 NULL
2023-10-13T09:12:54Z INFO - STATE = REINSTALLED
2023-10-13T09:12:54Z INFO -
site_name   site_id  dr_state     dr_backup_site_name  backup_id
lfg-site-1  1        REINSTALLED  NULL                 NULL
lfg-site-2  2        COMPLETED    NULL                 NULL
2023-10-13T09:13:00Z INFO - STATE = INITIATEBACKUP
2023-10-13T09:13:00Z INFO -
site_name   site_id  dr_state        dr_backup_site_name  backup_id
lfg-site-1  1        INITIATEBACKUP  NULL                 NULL
lfg-site-2  2        COMPLETED       NULL                 NULL
2023-10-13T09:13:05Z INFO - STATE = CHECKBACKUP
2023-10-13T09:13:06Z INFO -
site_name   site_id  dr_state     dr_backup_site_name  backup_id
lfg-site-1  1        CHECKBACKUP  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED    NULL                 NULL
2023-10-13T09:13:07Z INFO - Droping databases with '-' (dash) in their name (10.121.27.163)...
2023-10-13T09:13:08Z INFO - Disabling error loging while deleting databases...
2023-10-13T09:13:08Z INFO - Re-enabling error loging...
2023-10-13T09:13:08Z INFO - Dropped databases
2023-10-13T09:13:59Z INFO - STATE = COPY_BACKUP
2023-10-13T09:14:00Z INFO -
site_name   site_id  dr_state     dr_backup_site_name  backup_id
lfg-site-1  1        COPY_BACKUP  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED    NULL                 NULL
2023-10-13T09:14:05Z INFO - STATE = CHECK_BACKUP_COPY
2023-10-13T09:14:05Z INFO -
site_name   site_id  dr_state           dr_backup_site_name  backup_id
lfg-site-1  1        CHECK_BACKUP_COPY  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED          NULL                 NULL
2023-10-13T09:15:01Z INFO - STATE = BACKUPCOPIED
2023-10-13T09:15:01Z INFO -
site_name   site_id  dr_state      dr_backup_site_name  backup_id
lfg-site-1  1        BACKUPCOPIED  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED     NULL                 NULL
2023-10-13T09:15:15Z INFO - STATE = BACKUPEXTRACTED
2023-10-13T09:15:15Z INFO -
site_name   site_id  dr_state         dr_backup_site_name  backup_id
lfg-site-1  1        BACKUPEXTRACTED  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED        NULL                 NULL
2023-10-13T09:15:16Z INFO - Disabling error loging on lfg-site-1...
2023-10-13T09:15:16Z INFO - Disabling printing FAILED or "" (empty) STATEs...
2023-10-13T09:15:27Z INFO - Waiting for pods to restart...
2023-10-13T09:16:10Z INFO - STATE = RECONNECTSQLNODES
2023-10-13T09:16:10Z INFO -
site_name   site_id  dr_state           dr_backup_site_name  backup_id
lfg-site-1  1        RECONNECTSQLNODES  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED          lfg-site-1           1013230837
2023-10-13T09:16:12Z INFO - Waiting for pods to restart...
2023-10-13T09:16:21Z INFO - Re-enabling printing FAILED or "" (empty) STATEs again if they occur
2023-10-13T09:16:21Z INFO - Re-enabling error loging...
2023-10-13T09:16:21Z INFO - STATE = BACKUPRESTORE
2023-10-13T09:16:22Z INFO -
site_name   site_id  dr_state       dr_backup_site_name  backup_id
lfg-site-1  1        BACKUPRESTORE  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED      lfg-site-1           1013230837
2023-10-13T09:16:43Z INFO - STATE = RESTORED
2023-10-13T09:16:43Z INFO -
site_name   site_id  dr_state   dr_backup_site_name  backup_id
lfg-site-1  1        RESTORED   lfg-site-2           1013230913
lfg-site-2  2        COMPLETED  lfg-site-1           1013230837
2023-10-13T09:16:59Z INFO - STATE = BINLOGINITIALIZED
2023-10-13T09:17:00Z INFO -
site_name   site_id  dr_state           dr_backup_site_name  backup_id
lfg-site-1  1        BINLOGINITIALIZED  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED          lfg-site-1           1013230837
2023-10-13T09:17:13Z INFO - STATE = RECONFIGURE
2023-10-13T09:17:13Z INFO -
site_name   site_id  dr_state     dr_backup_site_name  backup_id
lfg-site-1  1        RECONFIGURE  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED    lfg-site-1           1013230837
2023-10-13T09:17:59Z INFO - STATE = COMPLETED
2023-10-13T09:18:00Z INFO -
site_name   site_id  dr_state   dr_backup_site_name  backup_id
lfg-site-1  1        COMPLETED  lfg-site-2           1013230913
lfg-site-2  2        COMPLETED  lfg-site-1           1013230837
2023-10-13T09:18:00Z INFO - ****************************************************************************************************
2023-10-13T09:18:00Z INFO - END PHASE 4: Start monitoring GRR
2023-10-13T09:18:00Z INFO - ****************************************************************************************************
2023-10-13T09:18:00Z INFO - ****************************************************************************************************
2023-10-13T09:18:00Z INFO - BEGIN PHASE 5: Run post-processing
2023-10-13T09:18:00Z INFO - ****************************************************************************************************
2023-10-13T09:18:00Z INFO - mate replication_svc IP: 10.121.27.176
2023-10-13T09:18:01Z INFO - lfg-site-1, 10.121.27.176:80, 22.4.2
2023-10-13T09:18:01Z INFO - GRR has automatically restarted mysqld containers (cnDBTier 22.4.2); skipping restart...
2023-10-13T09:18:01Z INFO - ****************************************************************************************************
2023-10-13T09:18:01Z INFO - END PHASE 5: Run post-processing
2023-10-13T09:18:01Z INFO - ****************************************************************************************************
2023-10-13T09:18:01Z INFO - GRR COMPLETED for lfg-site-1
2023-10-13T09:18:01Z INFO - dbtend_ts = 1697188681
2023-10-13T09:18:01Z INFO - GRR took: 00 hr. 05 min. 37 sec.
2023-10-13T09:18:01Z INFO - dbtrecover took: 00 hr. 09 min. 21 sec.

Note:

After recovering the site, create NF user accounts and grant privileges as per your requirement by following the Creating NF Users procedure.

7.4.4 Verifying cnDBTier Cluster Status

This section provides the procedures to verify cnDBTier cluster status using cnDBTier or CNC Console.

7.4.4.1 Verifying Cluster Status Using cnDBTier

This section provides the steps to verify the cluster status using cnDBTier.

Note:

db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.

Run the following command to check the status of the pods in cnDBTier clusters:

$ kubectl -n <namespace of cnDBTier Cluster> get pods

For example:

Run the following command to check the status of cnDBTier cluster1:

$ kubectl -n cluster1 get pods

Sample output:

NAME                                                              READY   STATUS    RESTARTS   AGE
mysql-cluster-cluster1-cluster2-replication-svc-86797d648fr6ttm   1/1     Running   0          6m38s
mysql-cluster-cluster1-cluster3-replication-svc-7fd6c75c6562gj8   1/1     Running   0          6m39s
mysql-cluster-cluster1-cluster4-replication-svc-869b65666bw7q6c   1/1     Running   0          6m38s
mysql-cluster-db-backup-manager-svc-64bf559895-qfkj7              1/1     Running   0          6m38s
mysql-cluster-db-monitor-svc-5bdfd4fb96-jwqzn                     1/1     Running   0          6m38s
ndbappmysqld-0                                                    2/2     Running   0          7m
ndbappmysqld-1                                                    2/2     Running   0          7m
ndbmgmd-0                                                         2/2     Running   0          7m
ndbmgmd-1                                                         2/2     Running   0          7m
ndbmtd-0                                                          3/3     Running   0          7m
ndbmtd-1                                                          3/3     Running   0          6m
ndbmtd-2                                                          3/3     Running   0          6m
ndbmtd-3                                                          3/3     Running   0          6m
ndbmysqld-0                                                       3/3     Running   0          6m
ndbmysqld-1                                                       3/3     Running   0          6m
ndbmysqld-2                                                       3/3     Running   0          6m
ndbmysqld-3                                                       3/3     Running   0          6m
ndbmysqld-4                                                       3/3     Running   0          6m
ndbmysqld-5                                                       3/3     Running   0          6m

Run the following command to check the status of cnDBTier cluster2:

$ kubectl -n  cluster2 get pods

Sample output:

NAME                                                              READY   STATUS    RESTARTS   AGE
mysql-cluster-cluster2-cluster1-replication-svc-fb48bfd9-4klzx    1/1     Running   0          10m
mysql-cluster-cluster2-cluster3-replication-svc-797599746dshtgr   1/1     Running   0          10m
mysql-cluster-cluster2-cluster4-replication-svc-6db8d76495bkf5v   1/1     Running   0          10m
mysql-cluster-db-backup-manager-svc-85d4b7f7c5-j7ptp              1/1     Running   0          16m
mysql-cluster-db-monitor-svc-5f68dd795f-cv8jc                     1/1     Running   0          16m
ndbappmysqld-0                                                    2/2     Running   0          16m
ndbappmysqld-1                                                    2/2     Running   0          16m
ndbmgmd-0                                                         2/2     Running   0          16m
ndbmgmd-1                                                         2/2     Running   0          16m
ndbmtd-0                                                          3/3     Running   0          16m
ndbmtd-1                                                          3/3     Running   0          15m
ndbmtd-2                                                          3/3     Running   0          14m
ndbmtd-3                                                          3/3     Running   0          14m
ndbmysqld-0                                                       3/3     Running   0          16m
ndbmysqld-1                                                       3/3     Running   0          13m
ndbmysqld-2                                                       3/3     Running   0          12m
ndbmysqld-3                                                       3/3     Running   0          12m
ndbmysqld-4                                                       3/3     Running   0          11m
ndbmysqld-5                                                       3/3     Running   0          11m

Run the following command to check the status of cnDBTier cluster3:

$ kubectl -n  cluster3 get pods

Sample output:

NAME                                                              READY   STATUS    RESTARTS   AGE
mysql-cluster-cluster3-cluster1-replication-svc-55696b5dbbdp9ts   1/1     Running   0          15m
mysql-cluster-cluster3-cluster2-replication-svc-778bd44fb7qgjt5   1/1     Running   0          15m
mysql-cluster-cluster3-cluster4-replication-svc-58c66f896b27cxh   1/1     Running   0          15m
mysql-cluster-db-backup-manager-svc-57f7d77d8c-pgllm              1/1     Running   0          15m
mysql-cluster-db-monitor-svc-784c4c7f9d-p5zvm                     1/1     Running   0          15m
ndbappmysqld-0                                                    2/2     Running   0          16m
ndbappmysqld-1                                                    2/2     Running   0          16m
ndbmgmd-0                                                         2/2     Running   0          16m
ndbmgmd-1                                                         2/2     Running   0          16m
ndbmtd-0                                                          3/3     Running   0          16m
ndbmtd-1                                                          3/3     Running   0          15m
ndbmtd-2                                                          3/3     Running   0          14m
ndbmtd-3                                                          3/3     Running   0          14m
ndbmysqld-0                                                       3/3     Running   0          18m
ndbmysqld-1                                                       3/3     Running   0          18m
ndbmysqld-2                                                       3/3     Running   0          18m
ndbmysqld-3                                                       3/3     Running   0          18m
ndbmysqld-4                                                       3/3     Running   0          18m
ndbmysqld-5                                                       3/3     Running   0          18m

Run the following command to check the status of cnDBTier cluster4:

$ kubectl -n  cluster4 get pods

Sample output:

NAME                                                              READY   STATUS    RESTARTS   AGE
mysql-cluster-cluster4-cluster1-replication-svc-7dd6987b4fpwvb5   1/1     Running   0          24m
mysql-cluster-cluster4-cluster2-replication-svc-5ddbdcdd75j5rj2   1/1     Running   0          24m
mysql-cluster-cluster4-cluster3-replication-svc-64b7bbcfcblmz4j   1/1     Running   0          24m
mysql-cluster-db-backup-manager-svc-fc49f8c9f-xng57               1/1     Running   0          24m
mysql-cluster-db-monitor-svc-5cfd757866-s6lk6                     1/1     Running   0          24m
ndbappmysqld-0                                                    2/2     Running   0          16m
ndbappmysqld-1                                                    2/2     Running   0          16m
ndbmgmd-0                                                         2/2     Running   0          12m
ndbmgmd-1                                                         2/2     Running   0          12m
ndbmtd-0                                                          3/3     Running   0          16m
ndbmtd-1                                                          3/3     Running   0          17m
ndbmtd-2                                                          3/3     Running   0          18m
ndbmtd-3                                                          3/3     Running   0          19m
ndbmysqld-0                                                       3/3     Running   0          21m
ndbmysqld-1                                                       3/3     Running   0          21m
ndbmysqld-2                                                       3/3     Running   0          22m
ndbmysqld-3                                                       3/3     Running   0          23m
ndbmysqld-4                                                       3/3     Running   0          23m
ndbmysqld-5                                                       3/3     Running   0          24m

Run the following command to check the status of the MySQL NDB cluster in cnDBtier cluster:

$ kubectl -n <namespace of cnDBTier Cluster> exec -it ndbmgmd-0 -- ndb_mgm -e show

For example:

Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster1:

$ kubectl -n cluster1 exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.124.92  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=2    @10.233.113.109  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.116.79  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1, *)
id=4    @10.233.71.90  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
  
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.78.61  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.109.99  (mysql-8.4.2 ndb-8.4.2)
  
[mysqld(API)]   12 node(s)
id=56   @10.233.120.210  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.124.93  (mysql-8.4.2 ndb-8.4.2)
id=58   @10.233.109.230  (mysql-8.4.2 ndb-8.4.2)
id=59   @10.233.71.17  (mysql-8.4.2 ndb-8.4.2)
id=60   @10.233.114.197  (mysql-8.4.2 ndb-8.4.2)
id=61   @10.233.116.251  (mysql-8.4.2 ndb-8.4.2)
id=70   @10.233.71.89  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.113.110  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster2:

$ kubectl -n  cluster2 exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.116.82  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.120.61  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.109.100  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
id=4    @10.233.89.68  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.84.54  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.114.63  (mysql-8.4.2 ndb-8.4.2)
 
[mysqld(API)]   12 node(s)
id=56   @10.233.89.246  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.116.250  (mysql-8.4.2 ndb-8.4.2)
id=58   @10.233.121.201  (mysql-8.4.2 ndb-8.4.2)
id=59   @10.233.78.250  (mysql-8.4.2 ndb-8.4.2)
id=60   @10.233.84.213  (mysql-8.4.2 ndb-8.4.2)
id=61   @10.233.124.95  (mysql-8.4.2 ndb-8.4.2)
id=70   @10.233.121.56  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.84.55  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster3:

$ kubectl -n  cluster3 exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.108.208  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.78.249  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.39.100  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
id=4    @10.233.36.68  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.113.87  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.114.32  (mysql-8.4.2 ndb-8.4.2)
 
[mysqld(API)]   12 node(s)
id=56   @10.233.71.16  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.114.196  (mysql-8.4.2 ndb-8.4.2)
id=58   @10.233.84.212  (mysql-8.4.2 ndb-8.4.2)
id=59   @10.233.108.210  (mysql-8.4.2 ndb-8.4.2)
id=60   @10.233.121.202  (mysql-8.4.2 ndb-8.4.2)
id=61   @10.233.109.231  (mysql-8.4.2 ndb-8.4.2)
id=70   @10.233.121.37  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.84.38  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster4:

$ kubectl -n  cluster4 exec -it ndbmgmd-0 -- ndb_mgm -e show

Sample output:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=1    @10.233.78.248  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *)
id=2    @10.233.108.209  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0)
id=3    @10.233.109.43  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
id=4    @10.233.89.44  (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1)
 
[ndb_mgmd(MGM)] 2 node(s)
id=49   @10.233.89.247  (mysql-8.4.2 ndb-8.4.2)
id=50   @10.233.99.32  (mysql-8.4.2 ndb-8.4.2)
 
[mysqld(API)]   12 node(s)
id=56   @10.233.109.228  (mysql-8.4.2 ndb-8.4.2)
id=57   @10.233.121.200  (mysql-8.4.2 ndb-8.4.2)
id=58   @10.233.84.211  (mysql-8.4.2 ndb-8.4.2)
id=59   @10.233.124.94  (mysql-8.4.2 ndb-8.4.2)
id=60   @10.233.113.89  (mysql-8.4.2 ndb-8.4.2)
id=61   @10.233.114.198  (mysql-8.4.2 ndb-8.4.2)
id=70   @10.233.121.47  (mysql-8.4.2 ndb-8.4.2)
id=71   @10.233.84.48  (mysql-8.4.2 ndb-8.4.2)
id=222 (not connected, accepting connect from any host)
id=223 (not connected, accepting connect from any host)
id=224 (not connected, accepting connect from any host)
id=225 (not connected, accepting connect from any host)

Note:

Node IDs 222 to 225 in the sample outputs are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs.

If all the nodes are connected to the MySQL NDB cluster and all pods of cnDBTier cluster are in the Running State, then cnDBTier cluster is considered as UP and to not have fatal errors.

7.4.4.2 Verifying cnDBTier Cluster Status Using CNC Console

This section provides the steps to verify cnDBTier cluster status using CNC console.

Log in to CNC Console GUI of the cluster.
For example:
- If you want to verify the status of cluster1, then log in to the CNC Console GUI of cluster1.
- If you want to verify the status of cluster2, then log in to the CNC Console GUI of cluster2.
Expand cnDBTier under the NF menu.
Select Local Cluster Status.
The system displays the name and the status of the cluster. For example, if you are logged in to cluster1, the system displays the status of cluster1.

The following image shows a sample Local Cluster Status page displaying the status of cluster1:

Figure 7-12 Verify Cluster Status

7.4.5 Monitoring the Georeplication Recovery Status

This section provides the procedures and examples to monitor georeplication recovery status using cnDBTier APIs or CNC console.

7.4.5.1 Monitoring Georeplication Recovery Status Using cnDBTier APIs

This section provides the procedure and examples to monitor georeplication recovery status using cnDBTier APIs.

Perform the following steps to monitor the georeplication recovery state:

Note:

db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.
The georeplication recovery process transitions through various states depending on different scenarios and configurations (REINSTALLED → STARTDRRESTORE → INITIATEBACKUP → CHECKBACKUP →COPY_BACKUP→ CHECK_BACKUP_COPY→ BACKUPCOPIED → BACKUPEXTRACTED→ FAILED→ RECONNECTSQLNODES->BACKUPRESTORE→ RESTORED→ BINLOGINITIALIZED→ RECONFIGURE→ COMPLETED).
If dr_state displays GRR_FAILED, it indicates that the georeplication recovery failed. In such a case, check the replication service logs for more details and restart the georeplication recovery.
The system raises different alerts in case of backup transfer failure such as BACKUP_TRANSFER_LOCAL_FAILED and BACKUP_TRANSFER_FAILED. For more information about backup transfer status alerts, see the "cnDBTier Alerts" section in Oracle Communications Cloud Native Core, cnDBTier User Guide.

Run the following command to get the replication service LoadBalancer IP of the site on which the restore is being performed.
```
$ export IP=$(kubectl get svc -n <namespace> | grep repl | awk '{print $4}' | head -n 1 )
```
where, <namespace> is the namespace of the failed site.
Run the following command to get the replication service LoadBalancer Port of the site on which restore is being performed:
```
$ export PORT=$(kubectl get svc -n <namespace> | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
where <namesapce> is the namespace of the failed site.
Run the following command to get the georeplication restore status. If the value of gr_state is COMPLETED, then database is restored and the replication channels are reestablished.
```
$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/{sitename}/status
```
Note:
For more information about georeplication recover API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.

Examples for monitoring georeplication recovery status

Example 1: The reinstalled cluster is cnDBTier cluster1 and one of the remote site configured with it is cluster2:

Run the following command to get the replication service LoadBalancer IP for cluster1:
```
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
```
Run the following command to get the replication service LoadBalancer IP for cluster1:
```
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1) 
```
Run the following command to get the georeplication restore status of cluster1. If the value of gr_state is COMPLETED, then database is restored and the replication channels are reestablished:
```
$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status
```
Sample output if georeplication completed successfully:
```
{"localSiteName":"cluster1","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster2","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
```

Example 2: The reinstalled cluster is cnDBTier cluster2 and one of the remote site configured with it is cluster1:

Run the following command to get the replication service LoadBalancer IP for cluster2:
```
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )  
```
Run the following command to get the replication service LoadBalancer IP for cluster2:
```
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
Run the following command to get the georeplication restore status of cluster2. If the value of gr_state is COMPLETED, then database is restored and the replication channels are reestablished:
```
$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/status
```
Sample output if georeplication completed successfully:
```
{"localSiteName":"cluster2","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
```

Example 3: The reinstalled cluster is cnDBTier cluster3 and one of the remote site configured with it is cluster1:

Run the following command to get the replication service LoadBalancer IP for cluster3:
```
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )  
```
Run the following command to get the replication service LoadBalancer IP for cluster3:
```
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
Run the following command to get the georeplication restore status of cluster3. If the value of gr_state is COMPLETED, then database is restored and the replication channels are reestablished:
```
$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/status
```
Sample output if georeplication completed successfully:
```
{"localSiteName":"cluster3","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
```

Example 4: The reinstalled cluster is cnDBTier cluster4 and one of the remote site configured with it is cluster1:

Run the following command to get the replication service LoadBalancer IP for cluster4:
```
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )  
```
Run the following command to get the replication service LoadBalancer IP for cluster4:
```
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)  
```
Run the following command to get the georeplication restore status of cluster4. If the value of gr_state is COMPLETED, then database is restored and the replication channels are reestablished:
```
$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/status
```
Sample output if georeplication completed successfully:
```
{"localSiteName":"cluster4","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
```

7.4.5.2 Monitoring Georeplication Recovery Status Using CNC Console

This section provides the procedure and examples to monitor georeplication recovery status using CNC Console.

[Optional]: Log in to the CNC Console GUI of the cluster. Skip this step if you are already logged in.
Expand cnDBTier under the NF menu and select Georeplication Recovery.
Click Georeplication Recovery Status.
The system displays the current status of the georeplication recovery as shown in the following image:

Figure 7-13 Georeplication Recovery Status

For more information about the different georeplication recovery status, see the Georeplication Recovery Status section.

Note:
If georeplication recovery status remains in the FAILED status right after theCHECK_BACKUP_COPY status for a long time, then check the replication service logs for more information about georeplication recovery status and to check if the backup transfer failed.

7.4.5.3 Georeplication Recovery Status

This section provides information about the different status the system undergoes while performing a georeplication recovery.

Table 7-3 Georeplication Recovery Status

Georeplication Recovery Status	Description
ACTIVE	Indicates that the cluster is in healthy state, and the replication is UP and running in sync with its mate cluster.
REINSTALLED	Indicates that the cluster is reinstalled to resolve fatal error.
STARTDRRESTORE	Indicates that the georeplication recovery has started.
VALIDATERESOURCES	Indicates that the georeplication recovery resources in the cluster (PVC size and CPU of the leader DB replication service) where GRR is initiated is validated.
INITIATEBACKUP	Indicates that the cluster is identifying the healthy cluster to initiate backup.
CHECKBACKUP	Indicated that the backup is initiated and the cluster is monitoring the backup initiation process. If the backup initiation fails, the cluster re-initiates the backup.
COPY_BACKUP	Indicate that the backup initiation is complete and the system has requested for the backup transfer from the healthy cluster to the cluster to be recovered.
CHECK_BACKUP_COPY	Indicates that the backup copy is in progress and the cluster is monitoring the backup transfer progress. If the backup transfer fails, the cluster re-initiates the backup transfer.
BACKUPCOPIED	Indicates that the backup transfer is complete and the georeplication recovery cluster can start the backup extraction process.
BACKUPEXTRACTED	Indicates that the backup extraction is complete at the georeplication recovery cluster and the system can start the backup restore.
FAILED	Indicates that the cluster is in failed state and needs to be recovered. This state can also indicates that the georeplication recovery has started and the database is restored using the healthy cluster backup.
UNKNOWN	Indicates that all the databases are removed from the georeplication recovery cluster so that the cluster can be restored using the backup from the healthy cluster.
RECONNECTSQLNODES	Indicates the SQL nodes must be down during the backup restore such that no record gets into the binlog of the georeplication recovery cluster.
BACKUPRESTORE	Indicates that the backup that is copied from the healthy cluster is being used to restore the georeplication recovery cluster.
RESTORED	Indicates that the cluster is restored using the backup and the system can start reestablishing the replication channels.
BINLOGINITIALIZED	Indicates that the binlogs are being reinitialized to start the restore of replication channels.
RECONFIGURE	Indicates that the binlogs are reinitialized and the system is reestablishing the replication channels with respect to its mate clusters.

7.4.6 Uninstalling cnDBTier Cluster

Perform the following steps to uninstall a cnDBTier cluster:

Run the following command to uninstall a cnDBTier cluster:
```
$ helm uninstall mysql-cluster --namespace <namespace>
```
where, <namespace> is the namespace name of the cnDBTier cluster.

For example,
Run the following command to uninstall the cnDBTier cluster1:
```
$ helm uninstall mysql-cluster --namespace cluster1
```
Sample output:
```
release "mysql-cluster" uninstalled
```
Run the following command to uninstall the cnDBTier cluster2:
```
$ helm uninstall mysql-cluster --namespace cluster2
```
Sample output:
```
release "mysql-cluster" uninstalled
```
Run the following command to uninstall the cnDBTier cluster3:
```
$ helm uninstall mysql-cluster --namespace cluster3
```
Sample output:
```
release "mysql-cluster" uninstalled
```
Run the following command to uninstall the cnDBTier cluster4:
```
$ helm uninstall mysql-cluster --namespace cluster4
```
Sample output:
```
release "mysql-cluster" uninstalled
```

Run the following command to delete PVC of cnDBTier cluster:

$ kubectl -n <namespace of cnDBTier cluster> get pvc
$ kubectl -n <namespace of cnDBTier cluster>  get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> delete pvc
$ kubectl -n <namespace of cnDBTier cluster> get pvc

For example,

Run the following command to delete the PVC of cnDBTier Cluster1:

$ kubectl -n cluster1 get pvc
$ kubectl -n cluster1 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 delete pvc
$ kubectl -n cluster1 get pvc

Run the following command to delete the PVC of cnDBTier Cluster2:

$ kubectl -n cluster2 get pvc
$ kubectl -n cluster2 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 delete pvc
$ kubectl -n cluster2 get pvc

Run the following command to delete the PVC of cnDBTier Cluster3:

$ kubectl -n cluster3 get pvc
$ kubectl -n cluster3 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 delete pvc
$ kubectl -n cluster3 get pvc

Run the following command to delete the PVC of cnDBTier Cluster4:

$ kubectl -n cluster4 get pvc
$ kubectl -n cluster4 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 delete pvc
$ kubectl -n cluster4 get pvc

7.4.7 Reinstalling cnDBTier Cluster

This section provides the procedure to reinstall a cnDBTier cluster.

Perform the following to reinstall the cnDBTier cluster:

Follow Uninstalling cnDBTier Cluster procedure for uninstalling the cnDBTier cluster.
Install the cnDBTier cluster by configuring the remote site IP address of the replication service of remote cnDBTier cluster for restoring the database. For more information on how to install the cnDBTier cluster, see Installing cnDBTier:
1. Update the remotesiteip configuration in the custom_values.yaml file with the remote site IP address or FQDN of the replication services from the remote cnDBtier cluster.
  1. If the cnDBTier cluster1 is undergoing a georeplication recovery:
    1. Retrieve the replication service IP address from cnDBTier cluster2 if the first mate site (cnDBTier cluster2) is already configured or not marked as FAILED:
```
$ kubectl -n cluster2 get svc | grep cluster2-cluster1-replication-svc | awk '{ print $4 }'
10.75.02.01
```
    2. Retrieve the replication service IP address from cnDBTier cluster3 if the second mate site (cnDBTier cluster3) is already configured or not marked as FAILED:
```
$ kubectl -n cluster3 get svc | grep cluster3-cluster1-replication-svc | awk '{ print $4 }'
10.75.03.01
```
    3. Retrieve the replication service IP address from cnDBTier cluster4 if the third mate site (cnDBTier cluster4) is already configured or not marked as FAILED:
```
$ kubectl -n cluster4 get svc | grep cluster4-cluster1-replication-svc | awk '{ print $4 }'
10.75.04.01
```
    4. Perform the following steps to update the custom_values.yaml file for cnDBTier cluster1, if any remote cnDBTier cluster is marked as FAILED or not installed then configure the remotesiteip as empty string "":
      1. Run the following command to edit the custom_values.yaml file:
        $ vi occndbtier/custom_values.yaml
      2. Configure the cnDBTier cluster2 mate site details if the first mate site is configured:
        replication: matesitename: "cluster2" remotesiteip: "10.75.02.01" remotesiteport: "80"
      3. Configure the cnDBTier cluster3 mate site details if the second mate site is configured:
        replication: matesitename: "cluster3" remotesiteip: "10.75.03.01" remotesiteport: "80"
      4. Configure the cnDBTier cluster4 mate site details if the third mate site is configured:
        replication: matesitename: "cluster4" remotesiteip: "10.75.04.01" remotesiteport: "80"
  2. If the cnDBTier cluster2 is undergoing a georeplication recovery:
    1. Retrieve the replication service IP address from cnDBTier cluster1 if the first mate site (cnDBTier cluster1) is already configured or not marked as FAILED:
```
$ kubectl -n cluster1 get svc | grep cluster1-cluster2-replication-svc | awk '{ print $4 }'
10.75.01.02
```
    2. Retrieve the replication service IP address from cnDBTier cluster3 if the second mate site (cnDBTier cluster3) is already configured or not marked as FAILED:
```
$ kubectl -n cluster3 get svc | grep cluster3-cluster2-replication-svc | awk '{ print $4 }'
10.75.03.02
```
    3. Retrieve the replication service IP address from cnDBTier cluster4 if the third mate site (cnDBTier cluster4) is already configured or not marked as FAILED:
```
$ kubectl -n cluster4 get svc | grep cluster4-cluster2-replication-svc | awk '{ print $4 }'
10.75.04.02
```
    4. Perform the following steps to update the custom_values.yaml file for cnDBTier cluster2, if any remote cnDBTier cluster is marked as FAILED or not installed then configure the remotesiteip as empty string "":
      1. Run the following command to edit the custom_values.yaml file:
        $ vi occndbtier/custom_values.yaml
      2. Configure the cnDBTier cluster1 mate site details if the first mate site is configured:
        replication: matesitename: "cluster1" remotesiteip: "10.75.01.02" remotesiteport: "80"
      3. Configure the cnDBTier cluster3 mate site details if the second mate site is configured:
        replication: matesitename: "cluster3" remotesiteip: "10.75.03.02" remotesiteport: "80"
      4. Configure the cnDBTier cluster4 mate site details if the third mate site is configured:
        replication: matesitename: "cluster4" remotesiteip: "10.75.04.02" remotesiteport: "80"
  3. If the cnDBTier cluster3 is undergoing a georeplication recovery:
    1. Retrieve the replication service IP address from cnDBTier cluster1 if the first mate site (cnDBTier cluster1) is already configured or not marked as FAILED:
```
$ kubectl -n cluster1 get svc | grep cluster1-cluster3-replication-svc | awk '{ print $4 }'
10.75.01.03
```
    2. Retrieve the replication service IP address from cnDBTier cluster2 if the second mate site (cnDBTier cluster2) is already configured or not marked as FAILED:
```
$ kubectl -n cluster2 get svc | grep cluster2-cluster3-replication-svc | awk '{ print $4 }'
10.75.02.03
```
    3. Retrieve the replication service IP address from cnDBTier cluster4 if the third mate site (cnDBTier cluster4) is already configured or not marked as FAILED:
```
$ kubectl -n cluster4 get svc | grep cluster4-cluster3-replication-svc | awk '{ print $4 }'
10.75.04.03
```
    4. Perform the following steps to update the custom_values.yaml file for cnDBTier cluster3, if any remote cnDBTier cluster is marked as FAILED or not installed then configure the remotesiteip as empty string "":
      1. Run the following command to edit the custom_values.yaml file:
        $ vi occndbtier/custom_values.yaml
      2. Configure the cnDBTier cluster1 mate site details if the first mate site is configured:
        replication: matesitename: "cluster1" remotesiteip: "10.75.01.03" remotesiteport: "80"
      3. Configure the cnDBTier cluster2 mate site details if the second mate site is configured:
        replication: matesitename: "cluster2" remotesiteip: "10.75.02.03" remotesiteport: "80"
      4. Configure the cnDBTier cluster4 mate site details if the third mate site is configured:
        replication: matesitename: "cluster4" remotesiteip: "10.75.04.03" remotesiteport: "80"
  4. If the cnDBTier cluster4 is undergoing a georeplication recovery:
    1. Retrieve the replication service IP address from cnDBTier cluster1 if the first mate site (cnDBTier cluster1) is already configured or not marked as FAILED:
```
$ kubectl -n cluster1 get svc | grep cluster1-cluster4-replication-svc | awk '{ print $4 }'
10.75.01.04
```
    2. Retrieve the replication service IP address from cnDBTier cluster2 if the second mate site (cnDBTier cluster2) is already configured or not marked as FAILED:
```
$ kubectl -n cluster2 get svc | grep cluster2-cluster4-replication-svc | awk '{ print $4 }'
10.75.02.04
```
    3. Retrieve the replication service IP address from cnDBTier cluster3 if the third mate site (cnDBTier cluster3) is already configured or not marked as FAILED:
```
$ kubectl -n cluster3 get svc | grep cluster3-cluster4-replication-svc | awk '{ print $4 }'
10.75.03.04
```
    4. Perform the following steps to update the custom_values.yaml file for cnDBTier cluster4, if any remote cnDBTier cluster is marked as FAILED or not installed then configure the remotesiteip as empty string "":
      1. Run the following command to edit the custom_values.yaml file:
        $ vi occndbtier/custom_values.yaml
      2. Configure the cnDBTier cluster1 mate site details if the first mate site is configured:
        replication: matesitename: "cluster1" remotesiteip: "10.75.01.04" remotesiteport: "80"
      3. Configure the cnDBTier cluster2 mate site details if the second mate site is configured:
        replication: matesitename: "cluster2" remotesiteip: "10.75.02.04" remotesiteport: "80"
      4. Configure the cnDBTier cluster3 mate site details if the third mate site is configured:
        replication: matesitename: "cluster3" remotesiteip: "10.75.03.04" remotesiteport: "80"
2. After updating the cnDBTier custom_values.yaml file for the cnDBTier cluster, see Installing cnDBTier for installing the failed cnDBTier cluster.
3. Log in to Bastion Host of cnDBTier cluster and scale down the db replication service deployments in the reinstalled cnDBTier cluster:
  
  Note:
  For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.
```
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=0
```
  For example:
  - Run the following command to scale down the db replication service deployments if the reinstalled cnDBTier cluster is cnDBTier cluster1:
```
$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0
```
  - Run the following command to scale down the db replication service deployments if the reinstalled cnDBTier cluster is cnDBTier cluster2:
```
$ kubectl -n cluster2 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 scale deployment --replicas=0
```
  - Run the following command to scale down the db replication service deployments if the reinstalled cnDBTier cluster is cnDBTier cluster3:
```
$ kubectl -n cluster3 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 scale deployment --replicas=0
```
  - Run the following command to scale down the db replication service deployments if the reinstalled cnDBTier cluster is cnDBTier cluster4:
```
$ kubectl -n cluster4 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 scale deployment --replicas=0
```
Update the replication service IP address of the cnDBTier cluster that is reinstalled in the remote cnDBTier clusters if FQDN is not used for "remotesiteip" config in the custom_values.yaml file. Upgrade the DB replication service deployments in the remote cnDBTier clusters.
1. If the cnDBTier cluster1 is reinstalled, then update the custom_values.yaml file in the remote cnDBTier clusters and upgrade DB replication service deployments:
  1. Run the following commands to update the IP address of the replication services in the remote cnDBTier clusters:
```
$ kubectl -n cluster1 get svc | grep cluster1-cluster2-replication-svc | awk '{ print $4 }'
10.75.01.02
$ kubectl -n cluster1 get svc | grep cluster1-cluster3-replication-svc | awk '{ print $4 }'
10.75.01.03
$ kubectl -n cluster1 get svc | grep cluster1-cluster4-replication-svc | awk '{ print $4 }'
10.75.01.04
```
  2. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster2 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster1 site IP address in cnDBTier cluster2:
```
replication:
        matesitename: "cluster1"
        remotesiteip: "10.75.01.02"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster2:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
```
  3. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster3 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster1 site IP details in cnDBTier cluster3:
```
replication:
        matesitename: "cluster1"
        remotesiteip: "10.75.01.03"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster3:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
```
  4. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster4 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster1 site IP details in cnDBTier cluster4:
```
replication:
        matesitename: "cluster1"
        remotesiteip: "10.75.01.04"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster4:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
```
2. If the cnDBTier cluster2 is reinstalled, then update the values.yaml file in the remote cnDBTier clusters and upgrade DB replication service deployments:
  1. Update the IP address of the replication services in the remote cnDBTier clusters:
```
$ kubectl -n cluster2 get svc | grep cluster2-cluster1-replication-svc | awk '{ print $4 }'
10.75.02.01
$ kubectl -n cluster2 get svc | grep cluster2-cluster3-replication-svc | awk '{ print $4 }'
10.75.02.03
$ kubectl -n cluster2 get svc | grep cluster2-cluster4-replication-svc | awk '{ print $4 }'
10.75.02.04
```
  2. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster1 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster2 site IP address in cnDBTier cluster1:
```
replication:
        matesitename: "cluster2"
        remotesiteip: "10.75.02.01"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster1:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
```
  3. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster3 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster2 site IP address in cnDBTier cluster3:
```
replication:
        matesitename: "cluster2"
        remotesiteip: "10.75.02.03"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster3:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
```
  4. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster3 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster2 site IP address in cnDBTier cluster3:
```
replication:
        matesitename: "cluster2"
        remotesiteip: "10.75.02.03"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster4:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
```
3. If cnDBTier cluster3 is reinstalled, then update the custom_values.yaml file in the remote cnDBTier clusters and upgrade DB replication service deployments:
  1. Update the IP address of the replication services in the remote cnDBTier clusters:
```
$ kubectl -n cluster3 get svc | grep cluster3-cluster1-replication-svc | awk '{ print $4 }'
10.75.03.01
$ kubectl -n cluster3 get svc | grep cluster3-cluster2-replication-svc | awk '{ print $4 }'
10.75.03.02
$ kubectl -n cluster3 get svc | grep cluster3-cluster4-replication-svc | awk '{ print $4 }'
10.75.03.04
```
  2. Perform the following steps to update the custom_values.yaml file for remote cnDBTier cluster1 and upgarde:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster3 site IP address in cnDBTier cluster1:
```
replication:
        matesitename: "cluster3"
        remotesiteip: "10.75.03.01"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster1:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
```
  3. Perform the following steps to update the custom_values.yaml file for remote cnDBTier cluster2 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster3 site IP details in cnDBTier cluster2:
```
replication:
        matesitename: "cluster3"
        remotesiteip: "10.75.03.02"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster2:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
```
  4. Perform the following steps to update the custom_values.yaml file for remote cnDBTier cluster4 upgarde:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster3 site IP details in cnDBTier cluster4:
```
replication:
        matesitename: "cluster3"
        remotesiteip: "10.75.03.04"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster4:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
```
4. If the cnDBTier cluster4 is reinstalled, then update the custom_values.yaml file in remote cnDBTier clusters and upgrade DB replication service deployments:
  1. Update the IP address of the replication services in the remote cnDBTier clusters:
```
$ kubectl -n cluster4 get svc | grep cluster4-cluster1-replication-svc | awk '{ print $4 }'
10.75.04.01
$ kubectl -n cluster4 get svc | grep cluster4-cluster2-replication-svc | awk '{ print $4 }'
10.75.04.02
$ kubectl -n cluster4 get svc | grep cluster4-cluster3-replication-svc | awk '{ print $4 }'
10.75.04.03
```
  2. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster1 and upgarde:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster4 site IP address in cnDBTier cluster1:
```
replication:
        matesitename: "cluster4"
        remotesiteip: "10.75.04.01"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster1:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
```
  3. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster2 and upgrade:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster4 site IP address in cnDBTier cluster2:
```
replication:
        matesitename: "cluster4"
        remotesiteip: "10.75.04.02"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster2:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
```
  4. Perform the following steps to update the custom_values.yaml file of cnDBTier cluster3 and upgarde:
    1. Run the following command to edit the custom_values.yaml file:
```
$ vi occndbtier/custom_values.yaml
```
    2. Configure the cnDBTier cluster4 site IP address in cnDBTier cluster3:
```
replication:
        matesitename: "cluster4"
        remotesiteip: "10.75.04.03"
        remotesiteport: "80"
```
    3. Upgrade DB replication service deployments in cnDBTier cluster3:
```
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
```
Create the required NF-specific user accounts and grants to match NF users and grants of the good site in the reinstalled cnDBTier if user accounts does not exist. For sample procedure, see Creating NF Users.

Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide.
If cnDBTier Cluster is reinstalled and fault recovery is performed in a four site setup, then update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment variable in the DB replication service deployments on the reinstalled cnDBTier clusters.

Note:
For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.
```
$ for replsvcdeploy in $(kubectl -n <namespace of cnDBTier cluster> get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n <namespace of cnDBTier cluster> set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
```
For example:
- If cnDBTier cluster1 is reinstalled and restored, then update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment variable in the DB replication service deployments of cnDBTier cluster1:
```
$ for replsvcdeploy in $(kubectl -n cluster1 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster1 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
```
- If cnDBTier cluster2 is reinstalled and restored, then update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment variable in the DB replication service deployments of cnDBTier cluster2:
```
$ for replsvcdeploy in $(kubectl -n cluster2 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster2 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
```
- If cnDBTier cluster3 is reinstalled and restored, then update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment variable in the DB replication service deployments of cnDBTier cluster3:
```
$ for replsvcdeploy in $(kubectl -n cluster3 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster3 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
```
- If cnDBTier cluster4 is reinstalled and restored, then update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment variable in the DB replication service deployments of cnDBTier cluster4:
```
$ for replsvcdeploy in $(kubectl -n cluster4 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster4 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
```
Scale up DB replication service deployments in the reinstalled cnDBTier cluster:

Note:
For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.
1. Log in to Bastion Host of cnDBTier Cluster and scale up the DB replication service deployments:
```
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1
```
  For example:
  - Run the following command if the reinstalled cnDBTier cluster is cnDBTier cluster1:
```
$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1
```
  - Run the following command if the reinstalled cnDBTier cluster is cnDBTier cluster2:
```
$ kubectl -n cluster2 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 scale deployment --replicas=1
```
  - Run the following command if the reinstalled cnDBTier cluster is cnDBTier cluster3:
```
$ kubectl -n cluster3 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 scale deployment --replicas=1
```
  - Run the following command if the reinstalled cnDBTier cluster is cnDBTier cluster4:
```
$ kubectl -n cluster4 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 scale deployment --replicas=1
```

7.4.8 Fault Recovery APIs

This section provides information about the cnDBTier fault recovery APIs that are used in the various stages of fault recovery procedures.

Table 7-4 Fault Recovery APIs

Fault Recovery API	Rest URL	HTTP Method	Request Payload	Response Code	Response Payload
Mark cluster as failed in unhealthy cluster	http://`base-uri`/db-tier/gr-recovery/site/{siteName}/failed	POST	NA	200 OK - gr_state marked as failed successfully 404 NOT FOUND - site name is not same as the configured site 500 INTERNAL SERVER ERROR - Could not update gr_state as failed.	404 NOT FOUND: `could not mark site as failed because site name is not same as configured site` or `<TableName> table doesn't exist in replication_info database so could not mark site as failed` 500 INTERNAL SERVER ERROR: `Could not update the gr_state as FAILED.` or `Could not update the gr_state as FAILED, <Exception Message>`
Mark cluster as failed in healthy cluster.	http://`base-uri`/db-tier/gr-recovery/remotesite/{siteName}/failed	POST	NA	200 OK - gr_state marked as failed successfully. 404 NOT FOUND - remote site is not configured with siteName 500 INTERNAL SERVER ERROR - could not mark site as failed in the remote site.	404 NOT FOUND: `current site is not configured with remoteSiteName` or `<TableName> table doesn't exist in replication_info database so could not mark site as failed` or `remotesite: <remoteSiteName> is not configured within current site for replication so cannot mark the remote site as failed` 500 INTERNAL SERVER ERROR: `Exception occurred while marking the remote site as failed, <Exception Message>`
Start recovering database from healthy cluster.	http://`base-uri`/db-tier/gr-recovery/site/{siteName}/start	POST	NA	200 OK - started the geo replication restore successfully. 404 NOT FOUND - site name is incorrect. 503 SERVICE UNAVAILABLE - site is not configured with any remote site. 409 CONFLICT - could not start the restore because gr_state has not failed. 500 INTERNAL SERVER ERROR - could not start the restore of geo replication.	404 NOT FOUND: `Site Name is null so cannot start the Restore Geo Replication.` or `Site: <siteName> doesn't match with the current site Name so cannot start the Restore Geo Replication` or `<TableName> table doesn't exist in replication_info database so could not mark site as failed` 503 SERVICE UNAVAILABLE: `Site: <siteName> is not established replication or configured with any remote site` 409 CONFLICT: `Site: <siteName> is not Marked as FAILED so could not initiate the Restore Geo Replication` 500 INTERNAL SERVER ERROR: `Could not Start the restore of Geo Replication, <Exception Message>`
Start recovering database by selecting healthy cluster	http://`base-uri`/db-tier/gr-recovery/site/{siteName}/grbackupsite/{backupSiteName}/start	POST	NA	200 OK - started the geo replication restore successfully. 404 NOT FOUND - site name is incorrect. 503 SERVICE UNAVAILABLE - site is not configured with any remote site. 409 CONFLICT - could not start the restore because gr_state has not failed. 500 INTERNAL SERVER ERROR - could not start the restore of georeplication.	404 NOT FOUND: `Site Name is null so cannot start the Restore Geo Replication.` or `Site: <siteName> doesn't match with the current site Name so cannot start the Restore Geo Replication` or `<TableName> table doesn't exist in replication_info database so could not mark site as failed` or `Site: <sitename> Backup Site Name is same as current sitename so cannot start the Restore Geo Replication` 503 SERVICE UNAVAILABLE: `Site: <siteName> is not established replication or configured with any remote site` or `Site: <siteName> is not established replication or configured with any remote site` 409 CONFLICT: `Site: <siteName> is not Marked as FAILED so could not initiate the Restore Geo Replication` 500 INTERNAL SERVER ERROR: `Could not Start the restore of Geo Replication, <Exception Message>`
Monitor the Fault Recovery State	http://`base-uri`/db-tier/gr-recovery/site/{siteName}/status	GET	NA	200 OK 404 NOT FOUND - site name is incorrect. 503 SERVICE UNAVAILABLE - site is not configured with any remote site. 500 INTERNAL SERVER ERROR - failed to find geo replication recovery status.	200 OK: { "localSiteName": "<failed site name>", "grstatus": "COMPLETED", "gr_state": "COMPLETED", "remotesitesGrInfo": [ { "remoteSiteName": "<remote site name>", "replchannel_group_id": "1", "gr_state": "COMPLETED" } ] } Note: `localSiteName`, `gr_state` and `grstatus` must be primarily monitored for monitoring replication recovery status. `remotesitesGrInfo` can be referred after gr_state is RESTORED. 404 NOT FOUND: `Site: <siteName> is not same as configured site name.` or `<TableName> table doesn't exist in replication_info database so could not mark site as failed` 503 SERVICE UNAVAILABLE: `Site: <siteName> is not configured with any remote site.` 500 INTERNAL SERVER ERROR: `failed to find geo replication recovery status, <Exception Message>`

Note:

The value of <base-uri> in the REST URL is

<db-replication-svc
                Loadbalancer IP>: <db-replication-svc Loadbalancer PORT>

Depending on your scenario, refer to the following points to obtain the LoadBalancer IP and LoadBalancer Port of a cnDBTier cluster and use the fault recovery APIs:

When HTTP is enabled in a cnDBTier cluster:
- Run the following command to get the replication service LoadBalancer IP for cluster1:
```
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
```
- Run the following commands to get the replication service LoadBalancer Port for cluster1:
```
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)   
```
- Use the LoadBalancer IP and LoadBalancer Port obtained in the previous steps to call the API services.
  For example, you can invoke the fault recovery status API by replacing the $IP and $PORT with the LoadBalancer IP and LoadBalancer Port obtained in the previous steps:
```
$ curl -i -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status
```
When HTTPS is enabled in a cnDBTier cluster:
- Create the key PEM file (file.key.pem) and cert PEM file (file.crt.pem) using the p12 certificate (replicationcertificate.p12). Use the same p12 certificate that was used to enable the HTTPS while installing cnDBTier.
- Run the following command to get the replication service LoadBalancer IP for cluster1:
```
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
```
- Run the following commands to get the replication service LoadBalancer Port for cluster1:
```
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' |  cut -d '/' -f 1 |  cut -d ':' -f 1 | head -n 1)
```
- Use the LoadBalancer IP and LoadBalancer Port obtained in the previous steps to call the API services.
  For example, you can invoke the fault recovery status API by replacing the $IP and $PORT with the LoadBalancer IP and LoadBalancer Port obtained in the previous steps:
```
$ curl -k --cert file.crt.pem --cert-type PEM --key file.key.pem --key-type PEM --pass password https://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status
```