7 Fault Recovery
Note:
The fault recovery procedures can be used by Oracle customers as long as Oracle Customer Service personnel are involved or consulted.7.1 Restoring Single Node Failure
This section provides fault recovery procedures for restoring a single MySQL Network Database (NDB) cluster node failure (Management node, Data node, appSQL (non-georeplication) node, or SQL (georeplication node) in Cloud Native Database Tier (cnDBTier) Environment.
Prerequisites
- The fault recovery procedures described in this section can be
applied to one of following scenarios, consider a MySQL NDB cluster with 3
Management nodes, 4 Data nodes (2 data node groups and each group has 2 data
nodes) and 2 SQL nodes for example:
- One of the management nodes fails, while others are all in a running state.
- Two management nodes fail, while others are all in a running state.
- One of the data nodes fails, while others are all in a running state.
- One of the SQL (georeplication) or appSQL (non-georeplication) nodes fails, while others are all in a running state.
- One management node, one data node, one SQL (georeplication) node, and one appSQL (non-georeplication) node fails, while others are all in a running state.
- One management node, one Data node in each of the Data node group, and one SQL (georeplication) or appSQL (non-georeplication) node fails, while others are all in a running state.
- Ensure that the Oracle Communication Cloud Native Environment (OCCNE) cluster has a running Bastion Host for MySQL NDB cluster restoration.
7.1.1 Restoring Single Database Node Failure
Following is the fault recovery procedure for restoring a single Cloud Native Database Tier (cnDBTier) cluster node failure (Management node, Data node, SQL (georeplication) node, or appSQL (non-georeplication node) in the cnDBTier environment.
- Log in to the ndbmgmd-0 in
Cluster1:
$ kubectl -n Cluster1 exec -ti pod/ndbmgmd-0 -c mysqlndbcluster /bin/bash
- Log in to the NDB cluster management
client:
[mysql@ndbmgmd-0 ~]$ ndb_mgm
- Run the following command to verify that all nodes are listed with
no error
message:
-- NDB Cluster -- Management Client -- ndb_mgm> show
Sample output:-- NDB Cluster -- Management Client -- ndb_mgm> show Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.116.101 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.70.106 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.96.102 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) id=4 @10.233.119.205 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.77.54 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.112.126 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 8 node(s) id=70 @10.233.90.189 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.71.30 (mysql-8.4.2 ndb-8.4.2) id=72 @10.233.118.58 (mysql-8.4.2 ndb-8.4.2) id=73 @10.233.100.65 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Note:
Node IDs 222 to 225 in the sample output are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs.Note:
For appSQL nodes that are not in georeplication, follow the procedure till Step3.The following recovery scenario is an example explaining a data node pod failure and its corrupted PVC:
- Delete the corrupted PVC of the node that must be restored, for example,
the corrupted PVC belongs to the second data node pod:
Note:
Since the pod is not yet deleted, deleting the PVC using the below command freezes the session. To avoid this, open another terminal and continue the following steps.$ kubectl -n <namespace> delete pvc pvc-ndbmtd-ndbmtd-<id, could be 0, 1, 2, 3>
Where
<namespace>
is the failed data node.$ kubectl -n Cluster1 delete pvc pvc-ndbmtd-ndbmtd-1
Sample output:persistentvolumeclaim "pvc-ndbmtd-ndbmtd-1" deleted
- Delete the corrupted pod of the node that must be restored, for example,
the second data node
pod:
$ kubectl -n <namespace> delete pod/ndbmtd-<id, could be 0, 1, 2, 3>
For example:$ kubectl -n Cluster1 delete pod/ndbmtd-1
Sample output:pod "ndbmtd-1" deleted
This step may take some time to complete. Once the process exits with no error, the pod is in pending state and its PVC is not created automatically. This is because the HPA tries to create the pod without recreating the PVC.
$ kubectl get pod -n Cluster1
Sample output:NAME READY STATUS RESTARTS AGE pod/mysql-cluster-Cluster1-Cluster2-replication-svc-7b5cb67c9fqd7b4 1/1 Running 0 36m pod/mysql-cluster-Cluster1-Cluster3-replication-svc-86445d8cb42pz5x 1/1 Running 0 2m9s pod/mysql-cluster-db-monitor-svc-57d688947-bbqp7 1/1 Running 1 69m pod/ndbmgmd-0 2/2 Running 0 69m pod/ndbmgmd-1 2/2 Running 0 68m pod/ndbmtd-0 3/3 Running 0 69m pod/ndbmtd-1 3/3 Running 0 68m pod/ndbmysqld-0 3/3 Running 0 69m pod/ndbmysqld-1 3/3 Running 0 2m9s pod/ndbmysqld-2 3/3 Running 0 68m pod/ndbmysqld-3 3/3 Running 0 68m
- Delete the pod that you want to restore. Once you delete the pod,
Kubernetes brings it up
successfully:
$ kubectl -n <namespace> delete pod/ndbmtd-<id, could be 0, 1, 2, 3>
For example:$ kubectl -n Cluster1 delete pod/ndbmtd-1
Sample output:pod "ndbmtd-1" deleted
- Wait until the pod is up and running and verigy the status of the
pod:
$ kubectl get pod -n Cluster1
Sample output:NAME READY STATUS RESTARTS AGE pod/mysql-cluster-Cluster1-Cluster2-replication-svc-7b5cb67c9fqd7b4 1/1 Running 0 36m pod/mysql-cluster-Cluster1-Cluster3-replication-svc-86445d8cb42pz5x 1/1 Running 0 2m9s pod/mysql-cluster-db-monitor-svc-57d688947-bbqp7 1/1 Running 1 69m pod/ndbmgmd-0 2/2 Running 0 69m pod/ndbmgmd-1 2/2 Running 0 68m pod/ndbmtd-0 3/3 Running 0 69m pod/ndbmtd-1 3/3 Running 0 68m pod/ndbmysqld-0 3/3 Running 0 69m pod/ndbmysqld-1 3/3 Running 0 2m9s pod/ndbmysqld-2 3/3 Running 0 68m pod/ndbmysqld-3 3/3 Running 0 68m
- Perform the following to restore a SQL node after failure:
If the restored SQL node was involved in active replication, then try switching from the original standby SQL node to a newly restored SQL node that is on standby now, and verify that the replication channel can be successfully established.
- Stop replication on original standby SQL node so that switch
over of active replication channel happens to newly restored SQL
node:
$ kubectl -n Cluster1 exec -it ndbmysqld-1 -- mysql -h 127.0.0.1 -u<username> -p<password> -e "STOP REPLICA;"
For example:kubectl -n Cluster1 exec -it ndbmysqld-1 -- mysql -h 127.0.0.1 -uroot -pNextGenCne -e "STOP REPLICA;"
- Verify active replication on the newly restored SQL
node:
$ kubectl -n Cluster1 exec -it ndbmysqld-0 -- mysql -h127.0.0.1 -u<username> -p<password> -e "SHOW REPLICA STATUS\G"
Example:kubectl -n Cluster1 exec -it ndbmysqld-0 -- mysql -h127.0.0.1 -uroot -pNextGenCne -e "SHOW REPLICA STATUS\G"
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar mysql: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** Replica_IO_State: Waiting for source to send event Source_Host: 10.75.180.77 Source_User: occnerepluser Source_Port: 3306 Connect_Retry: 60 Source_Log_File: mysql-bin.000005 Read_Source_Log_Pos: 62625 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 2191 Relay_Source_Log_File: mysql-bin.000005 Replica_IO_Running: Yes Replica_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Source_Log_Pos: 62625 Relay_Log_Space: 2401 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Source_SSL_Allowed: No Source_SSL_CA_File: Source_SSL_CA_Path: Source_SSL_Cert: Source_SSL_Cipher: Source_SSL_Key: Seconds_Behind_Source: 0 Source_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Source_Server_Id: 2000 Source_UUID: 89219509-8fce-11ec-89b7-56d8e6d44947 Source_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Source_Retry_Count: 86400 Source_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Source_SSL_Crl: Source_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Source_TLS_Version: Source_public_key_path: Get_Source_public_key: 0 Network_Namespace:
Note:
For management nodes or SQL nodes, follow this procedure by replacing every occurrence of 'ndbmtd' in the commands with 'ndbmgmd' or 'ndbmysqld' respectively. - Stop replication on original standby SQL node so that switch
over of active replication channel happens to newly restored SQL
node:
7.1.2 Recovering a Single Node Replication Service
This section provides the procedure to recover a single node replication service in case of PVC corruption.
- Perform the following steps to scale down the replication service pod:
- Run the following command to get the list of deployments in
cluster1:
$ kubectl get deployment --namespace=cluster1
Sample output:NAME READY UP-TO-DATE AVAILABLE AGE mysql-cluster-cluster1-cluster2-replication-svc 1/1 1 1 11m mysql-cluster-db-backup-manager-svc 1/1 1 1 11m mysql-cluster-db-monitor-svc 1/1 1 1 11m
- Scale down the replication service of cluster1 with respect to
cluster2:
$ kubectl scale deployment mysql-cluster-cluster1-cluster2-replication-svc --namespace=cluster1 --replicas=0
Sample output:deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled
- Run the following command to get the list of deployments in
cluster1:
- Perform the following steps to delete the corrupted PVC of the db-replication-svc
pod:
- Run the following command to get the corrupted
PVC:
$ kubectl get pvc -n cluster1
Sample output:NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc-cluster1-cluster2-replication-svc Bound pvc-4f1e7afa-724d-470a-b1a5-dbe1257e7a48 8Gi RWO occne-dbtier-sc 12m pvc-ndbappmysqld-ndbappmysqld-0 Bound pvc-307f3d29-01d8-48a8-b580-15db8edc1121 2Gi RWO occne-dbtier-sc 12m pvc-ndbappmysqld-ndbappmysqld-1 Bound pvc-358b8d50-b53c-427f-87a3-ac9c139581b6 2Gi RWO occne-dbtier-sc 10m pvc-ndbmgmd-ndbmgmd-0 Bound pvc-965061dd-3d88-40fe-bf25-4260a61d0fa4 1Gi RWO occne-dbtier-sc 11m pvc-ndbmgmd-ndbmgmd-1 Bound pvc-8601a9b3-769c-41ca-aeb8-c5b5725693e3 1Gi RWO occne-dbtier-sc 11m pvc-ndbmtd-ndbmtd-0 Bound pvc-8bfab5e0-4a24-432f-98a4-a64080ccb33c 3Gi RWO occne-dbtier-sc 11m pvc-ndbmtd-ndbmtd-1 Bound pvc-196f7e87-824c-46d0-b0c9-8aad56806a34 3Gi RWO occne-dbtier-sc 11m pvc-ndbmysqld-ndbmysqld-0 Bound pvc-6ea3cd11-8cbe-4d2a-864d-baf7d22f4295 2Gi RWO occne-dbtier-sc 11m pvc-ndbmysqld-ndbmysqld-1 Bound pvc-c978f6de-0477-4a68-8ebe-3bcb899d27bb 2Gi RWO occne-dbtier-sc 10m
- Run the following command to delete the corrupted
PVC:
$ kubectl delete pvc pvc-cluster1-cluster2-replication-svc -n cluster1
Sample output:persistentvolumeclaim "pvc-cluster1-cluster2-replication-svc" deleted
- Run the following command to get the corrupted
PVC:
- Perform the following steps to check if the corrupted PVC is deleted:
- Get the list of deployments in cluster 1 and verify that the deleted PVC is
not present in the
output:
$ kubectl get pvc -n cluster1
Sample output:NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc-ndbappmysqld-ndbappmysqld-0 Bound pvc-307f3d29-01d8-48a8-b580-15db8edc1121 2Gi RWO occne-dbtier-sc 12m pvc-ndbappmysqld-ndbappmysqld-1 Bound pvc-358b8d50-b53c-427f-87a3-ac9c139581b6 2Gi RWO occne-dbtier-sc 11m pvc-ndbmgmd-ndbmgmd-0 Bound pvc-965061dd-3d88-40fe-bf25-4260a61d0fa4 1Gi RWO occne-dbtier-sc 12m pvc-ndbmgmd-ndbmgmd-1 Bound pvc-8601a9b3-769c-41ca-aeb8-c5b5725693e3 1Gi RWO occne-dbtier-sc 12m pvc-ndbmtd-ndbmtd-0 Bound pvc-8bfab5e0-4a24-432f-98a4-a64080ccb33c 3Gi RWO occne-dbtier-sc 12m pvc-ndbmtd-ndbmtd-1 Bound pvc-196f7e87-824c-46d0-b0c9-8aad56806a34 3Gi RWO occne-dbtier-sc 12m pvc-ndbmysqld-ndbmysqld-0 Bound pvc-6ea3cd11-8cbe-4d2a-864d-baf7d22f4295 2Gi RWO occne-dbtier-sc 12m pvc-ndbmysqld-ndbmysqld-1 Bound pvc-c978f6de-0477-4a68-8ebe-3bcb899d27bb 2Gi RWO occne-dbtier-sc 10m
- Run the following command to get the PV details and verify that the PV
associated with the corrupted PVC is not present in the
output:
$ kubectl get pv |grep cluster1 | grep repl
Sample output:pvc-ecc6d691-ca31-41c3-930c-c092d73452e8 8Gi RWO Delete Bound cluster2/pvc-cluster2-cluster1-replication-svc occne-dbtier-sc 10m
- Get the list of deployments in cluster 1 and verify that the deleted PVC is
not present in the
output:
- Run the following command to upgrade cnDBTier with the modified
custom_values.yaml
file:$ helm upgrade mysql-cluster occndbtier -f occndbtier/custom_values.yaml -n cluster1
When the upgrade is complete, the new db-replication-svc pod comes up with the new PVC.
- Perform a Helm test to ensure that all the cnDBTier services are
running
smoothly:
$ helm test mysql-cluster -n ${OCCNE_NAMESPACE}
Sample output:NAME: mysql-cluster LAST DEPLOYED: Mon May 20 09:07:56 2024 NAMESPACE: cluster1 STATUS: deployed REVISION: 2 TEST SUITE: mysql-cluster-node-connection-test Last Started: Mon May 20 09:15:39 2024 Last Completed: Mon May 20 09:16:16 2024 Phase: Succeeded
Check the status of cnDBTier Cluster1 by running the following command:$ kubectl -n ${OCCNE_NAMESPACE} exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, db-infra-monitor-svc Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=1 @10.233.74.160 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.79.154 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.78.169 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.102.220 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 8 node(s) id=56 @10.233.72.151 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.84.206 (mysql-8.4.2 ndb-8.4.2) id=70 @10.233.79.153 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.73.138 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
7.2 Restoring Database From Backup
This chapter provides procedure to restore the database from backup stored in ndb_restore utility.
7.2.1 Restoring Database from Backup with ndb_restore
This procedure restores the database nodes of a cnDBTier cluster from backup with MySQL ndb_restore utility.
Note:
- To restore the single site cnDBTier cluster deployment, download the NDB backup to the Bastion Host. If the backup is already downloaded in Bastion Host, then you can use the same backup for restoring the cnDBtier cluster in case of fatal errors.
- The cnDBTier backup that is used for the restore must be taken using the same cnDBTier version as that of the cnDBTier site that is being restored.
- For restoring the failed cnDBTier clusters in two site, three site, and four site cnDBTier deployment models, perform the procedures as described in Restoring Georeplication (GR) Failure section.
- db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.
- You can locate the scripts used in this section in the following location:
<path where CSAR package of cnDBTier is extracted>/Artifacts/Scripts/dr-procedure
.
Downloading the Latest DB Backup Before Restoration
Create a database backup of cnDBTier Cluster1. If the cnDBTier is UP, copy the backup directories and files to a local directory in your Bastion Host. For more information on how to create a backup, see Creating On-demand Database Backup.
- Check if the backup is already existing in the cnDBTier cluster that needs
to be downloaded to the Bastion
Host:
$ kubectl -n cluster1 exec -it ndbmtd-0 -- ls -lrt /var/ndbbackup/dbback/BACKUP
Example:$ kubectl -n cluster1 exec -it ndbmtd-0 -- ls -lrt /var/ndbbackup/dbback/BACKUP
Sample output:Defaulting container name to mysqlndbcluster. Use 'kubectl describe pod/ndbmtd-0 -n cluster1' to see all of the containers in this pod. total 8 drwxr-sr-x. 6 mysql mysql 4096 Feb 19 17:50 BACKUP-217221233 drwxr-s---. 6 mysql mysql 4096 Feb 19 18:33 BACKUP-217221332
- Download the backup from the cnDBTier cluster data nodes to the Bastion
Host.
- Run the following command to download the backup files to the Bastion
Host and compress the files as a single tar ball file. Select the backup ID from step
1 that needs to be
downloaded.
$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=<namespace of cndbtier cluster on which backup is created> BACKUP_DIR=<path where backup files are copied> DATA_NODE_COUNT=<ndb data node count> BACKUP_ID=<backup id> BACKUP_ENCRYPTION_ENABLE=false ./download_backup.sh <backup tar ball path>
Example:$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=cluster1 BACKUP_DIR=/var/ndbbackup DATA_NODE_COUNT=4 BACKUP_ID=217221233 BACKUP_ENCRYPTION_ENABLE=false ./download_backup.sh backup_217221233.tar.gz
Sample output:Defaulting container name to mysqlndbcluster. tar: Removing leading `/' from member names Defaulting container name to mysqlndbcluster. tar: Removing leading `/' from member names Defaulting container name to mysqlndbcluster. tar: Removing leading `/' from member names Defaulting container name to mysqlndbcluster. tar: Removing leading `/' from member names ./ ./ndbmtd-0/ ./ndbmtd-0/BACKUP-217221233/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-1/ ./ndbmtd-1/BACKUP-217221233/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-2/ ./ndbmtd-2/BACKUP-217221233/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-3/ ./ndbmtd-3/BACKUP-217221233/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.ctl Cluster backup 217221233 downloaded and compressed to backup_217221233.tar.gz successfully.
- Run the following command to download the backup files to the Bastion
Host and compress the files as a single tar ball file. Select the backup ID from step
1 that needs to be
downloaded.
- Note down the location of the backup tar file and backup ID. This path is required later for database restoration.
Restoring Database Schema and Tables
Note:
- Ensure that the cnDBTier backup that is used for the restore is taken using the same cnDBTier version as that of the cnDBTier site that is being restored.
- You can ignore the temporary errors observed in the NDB database that is restored if the restore completes successfully.
- Log in to the Bastion Host of the cnDBTier cluster to restore the database.
Check the backup tar file downloaded in the previous section, the backup file must be available before restoring of the database.
- If the backup is fetched from a remote server, then perform the following steps to
convert the backup file from zip format to the format required by the restore script:
- Copy the backup zip file to the folder containing the restore script.
- Run the following command to unzip the backup
file:
$ unzip backup_<backup_id>_<Encrypted/Unencrypted>.zip
For example:$ unzip backup_708241812_Encrypted.zip
Sample output:Archive: backup_708241812_Encrypted.zip inflating: backup_708241812_dn_1.tar.gz inflating: backup_708241812_dn_1.tar.gz.sha256 inflating: backup_708241812_dn_2.tar.gz inflating: backup_708241812_dn_2.tar.gz.sha256
- Untar the tar file that belongs to the first data node (the file name containing
dn_1
is the file for the first data node):$ tar -xzvf backup_<backup_id>_dn_1.tar.gz -C ./
For example:$ tar -xzvf backup_708241812_dn_1.tar.gz -C ./
Sample output:BACKUP-708241812/ BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.ctl BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.1.Data BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.log BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.ctl BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.1.Data BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.log BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.ctl BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.1.Data BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.log BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.ctl BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.1.Data BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.log
- Create a directory named
ndbmtd-0
and move the folder extracted in the previous step to thendbmtd-0
folder:$ mkdir ndbmtd-0 $ mv BACKUP-<backup_id> ./ndbmtd-0/
For example:$ mkdir ndbmtd-0 $ mv BACKUP-708241812 ./ndbmtd-0/
- Untar the tar file that belongs to the second data node (the file name containing
dn_2
is the file for the second data node):$ tar -xzvf backup_<backup_id>_dn_2.tar.gz -C ./
For example:$ tar -xzvf backup_708241812_dn_2.tar.gz -C ./
Sample output:BACKUP-708241812/ BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.log BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.ctl BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.2.Data BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.log BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.ctl BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.2.Data BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.log BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.ctl BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.2.Data BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/ BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.log BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.ctl BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.2.Data
- Create a directory named
ndbmtd-1
and move the folder extracted in the previous step to thendbmtd-1
folder:$ mkdir ndbmtd-1 $ mv BACKUP-<backup_id> ./ndbmtd-1/
For example:$ mkdir ndbmtd-1 $ mv BACKUP-708241812 ./ndbmtd-1/
- Repeat steps e and f for the remaining tar files of respective data nodes.
- Create a tar file to contain all the folders created in the previous
steps:
$ tar -czvf backup_<backup_id>.tar.gz ndbmtd-*
For example:tar -czvf backup_708241812.tar.gz ndbmtd-*
Sample output:ndbmtd-0/ ndbmtd-0/BACKUP-708241812/ ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/ ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.ctl ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.1.Data ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.1.log ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/ ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.ctl ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.1.Data ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.1.log ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/ ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.ctl ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.1.Data ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.1.log ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/ ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.ctl ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.1.Data ndbmtd-0/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.1.log ndbmtd-1/ ndbmtd-1/BACKUP-708241812/ ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/ ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.log ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812.2.ctl ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-1-OF-4/BACKUP-708241812-0.2.Data ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/ ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.log ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812.2.ctl ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-4-OF-4/BACKUP-708241812-0.2.Data ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/ ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.log ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812.2.ctl ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-2-OF-4/BACKUP-708241812-0.2.Data ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/ ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.log ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812.2.ctl ndbmtd-1/BACKUP-708241812/BACKUP-708241812-PART-3-OF-4/BACKUP-708241812-0.2.Data
- Use the created file (
backup_<backup_id>.tar.gz
) in the following steps to perform a restore.
- Reinstall the cnDBTier cluster, if the cluster fails due to fatal errors. For more information about installing the cnDBTier cluster, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Disable cnDBTier replication services on cnDBTier cluster if exists.
For single site deployment, this service is disabled.
Log in to the Bastion Host of cnDBTier Cluster and scale down the DB replication service deployment.
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=0
Example:$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0
Sample output:deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled
- Run the following command to disable DB backup manager service in cnDBTier
cluster:
Log in to the Bastion Host of cnDBTier Cluster and scale down the DB backup manager service deployment.
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0
Example:$ kubectl -n cluster1 get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0
Sample output:deployment.apps/mysql-cluster-db-backup-manager-svc scaled
- Wait until cnDBTier cluster is up and running. Check the status of the
cnDBTier cluster by running the following
commands:
$ kubectl -n <namespace of cnDBTier Cluster> get pods $ kubectl -n <namespace of cnDBTier Cluster> exec -it ndbmgmd-0 -- ndb_mgm -e show
Example: Checking the cluster status by accessing the pods running in the cluster$ kubectl -n cluster1 get pods
Sample output:NAME READY STATUS RESTARTS AGE mysql-cluster-db-monitor-svc-777dc5d7d7-pmzxz 1/1 Running 0 6h19m ndbappmysqld-0 2/2 Running 0 6h19m ndbappmysqld-1 2/2 Running 0 6h14m ndbmgmd-0 2/2 Running 0 6h19m ndbmgmd-1 2/2 Running 0 6h18m ndbmtd-0 3/3 Running 0 6h19m ndbmtd-1 3/3 Running 0 6h17m ndbmtd-2 3/3 Running 0 6h17m ndbmtd-3 3/3 Running 0 6h16m ndbmysqld-0 3/3 Running 0 6h19m ndbmysqld-1 3/3 Running 0 6h13m
Example: Checking the status of cluster from the management pod$ kubectl -n cluster1 exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.116.101 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.70.106 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.96.102 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) id=4 @10.233.119.205 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.77.54 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.112.126 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 8 node(s) id=56 @10.233.90.189 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.71.30 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.118.58 (mysql-8.4.2 ndb-8.4.2) id=72 @10.233.100.65 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Note:
Node IDs 222 to 225 in the sample output are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs. - Perform the following steps to restore NDB database
automatically:
- Run the following command to pick a backup to restore the NDB
cluster:
$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=<namespace> BACKUP_DIR=<backup_DIR> BACKUP_ID=<backup_ID> BACKUP_ENCRYPTION_ENABLE=<true/false> BACKUP_ENCRYPTION_PASSWORD=<backup encryption password> ./cndbtier_restore.sh <backup_tar_path>
where:- <namespace>: is the namespace of cnDBTier cluster to restore
- <backup_DIR>: is the path where backup files are copied
- <backup_ID>: is the backup ID obtained in Downloading Latest DB Backup Before Restoration
- <backup encryption password>: if BACKUP_ENCRYPTION_ENABLE is set to true, then this variable gives the backup encryption password
- <backup_tar_path>: is the backup tar ball path generated in Downloading Latest DB Backup Before Restoration
For example:
$ SKIP_NDB_APPLY_STATUS=1 CNDBTIER_NAMESPACE=cluster1 BACKUP_DIR="/var/ndbbackup" BACKUP_ID=217221233 BACKUP_ENCRYPTION_ENABLE=true BACKUP_ENCRYPTION_PASSWORD="NextGenCne" ./cndbtier_restore.sh backup_217221233.tar.gz
Sample output:Extracting backup part files to /tmp/.cndbtier-backup-MVkgJJperv ... ./ ./ndbmtd-0/ ./ndbmtd-0/BACKUP-217221233/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.log ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.1.Data ./ndbmtd-0/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.1.ctl ./ndbmtd-1/ ./ndbmtd-1/BACKUP-217221233/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.log ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.2.Data ./ndbmtd-1/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.2.ctl ./ndbmtd-2/ ./ndbmtd-2/BACKUP-217221233/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.ctl ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.3.log ./ndbmtd-2/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.3.Data ./ndbmtd-3/ ./ndbmtd-3/BACKUP-217221233/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-4-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-1-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-3-OF-4/BACKUP-217221233.4.ctl ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/ ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233-0.4.Data ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.log ./ndbmtd-3/BACKUP-217221233/BACKUP-217221233-PART-2-OF-4/BACKUP-217221233.4.ctl Defaulting container name to mysqlndbcluster. Defaulting container name to mysqlndbcluster. Defaulting container name to mysqlndbcluster. Defaulting container name to mysqlndbcluster. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. Nodeid = 1 Backup Id = 217221233 backup path = /var/ndbbackup/dbback/BACKUP/BACKUP-217221233 Found backup 217221233 with 4 backup parts ................ ................ ................ ................ ................ ................ 2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes 2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes 2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes 2023-11-07 18:04:06 [rebuild_indexes] Rebuilding indexes Rebuilding index `PRIMARY` on table `DBTIER_REPLICATION_CHANNEL_INFO` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_COMMAND_QUEUE` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_INFO` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_REPL_SITE_INFO` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_SITE_INFO` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_INITIAL_BINLOG_POSTION` ... OK (0 s) Rebuilding index `PRIMARY` on table `DBTIER_BACKUP_TRANSFER_INFO` ... OK (0 s) Rebuilding index `PRIMARY` on table `City` ... OK (0 s) Create foreign keys Create foreign keys done [INFO]: Value of ndbmysqldcount: 2. statefulset.apps/ndbmysqld scaled [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndbmysqld pods are up. [INFO]: Wait till all ndb pods are up. [INFO]: Scaled up all replication ndbmysqld pods. pod "ndbappmysqld-0" deleted pod "ndbappmysqld-1" deleted Cluster restore completed.
- Run the following command to pick a backup to restore the NDB
cluster:
- If user accounts do not exist, create the required NF-specific user accounts
and grants to match NF users and grants of the good site in the reinstalled cnDBTier. For
sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - Clean up backup_info database tables and mysql.ndb_apply_status table, once
the restore is
complete:
$ kubectl -n <namespace of cnDBTier Cluster> exec -it ndbmysqld-0 -- mysql -h 127.0.0.1 -uroot -p
Sample output:Enter Password:
$ mysql> DELETE FROM backup_info.DBTIER_BACKUP_COMMAND_QUEUE; $ mysql> DELETE FROM backup_info.DBTIER_BACKUP_INFO; $ mysql> DELETE FROM backup_info.DBTIER_BACKUP_TRANSFER_INFO; $ mysql> DELETE FROM mysql.ndb_apply_status; DELETE FROM replication_info.DBTIER_INITIAL_BINLOG_POSTION;
When georeplication failure occurs on all sites, clean up thereplication_info
database tables as the backup is taken from other clusters and the backup can be either a multi-channel or single-channel backup:$ mysql> DELETE FROM replication_info.DBTIER_REPLICATION_CHANNEL_INFO $ mysql> DELETE FROM replication_info.DBTIER_REPL_ERROR_SKIP_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_EVENT_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_SITE_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_SVC_INFO; $ mysql> DELETE FROM replication_info.DBTIER_SITE_INFO;
Example to remove all the entries from thebackup_info.DBTIER_BACKUP_INFO
table andmysql.ndb_apply_status
table:$ kubectl -n cluster1 exec -it ndbmysqld-0 -- mysql -h 127.0.0.1 -uroot -p Enter Password: $ mysql> DELETE FROM backup_info.DBTIER_BACKUP_COMMAND_QUEUE; $ mysql> DELETE FROM backup_info.DBTIER_BACKUP_INFO; $ mysql> DELETE FROM backup_info.DBTIER_BACKUP_TRANSFER_INFO; $ mysql> DELETE FROM mysql.ndb_apply_status;
Example to clean up thereplication_info
database tables$ mysql> DELETE FROM replication_info.DBTIER_REPLICATION_CHANNEL_INFO $ mysql> DELETE FROM replication_info.DBTIER_REPL_ERROR_SKIP_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_EVENT_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_SITE_INFO; $ mysql> DELETE FROM replication_info.DBTIER_REPL_SVC_INFO; $ mysql> DELETE FROM replication_info.DBTIER_SITE_INFO;
- Reenable the cnDBTier replication service in cnDBTier Cluster:
- Log in to the Bastion Host of cnDBTier Cluster and scale up the DB
replication service deployment if deployment
exists:
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1
Example:$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1
Sample output:deployment.apps/mysql-cluster-cluster1-cluster2-replication-svc scaled
- Log in to Bastion Host of cnDBTier cluster and scale up the DB backup
manager service
deployment:
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1
For example:$ kubectl -n cluster1 get deployments | egrep 'db-backup-manager-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1
Sample output:deployment.apps/mysql-cluster-db-backup-manager-svc scaled
- Log in to the Bastion Host of cnDBTier Cluster and scale up the DB
replication service deployment if deployment
exists:
7.3 Creating On-demand Database Backup
This section provides the procedure to create on-demand database backup using cnDBTier backup service.
Prerequisites
The cnDBTier cluster must be in a healthy state, that is, every database node must be in the Running state.- Log in to the bastion host of the cnDBTier cluster.
- Perform the following to create the on-demand backup:
- Run the following command to get
the
db-backup-manager-svc
pod name from cnDBTier cluster:$ kubectl -n <namespace of cnDBTier cluster> get pods | grep "db-backup-manager-svc" | awk '{print $1}'
For example:$ kubectl -n cluster1 get pods | grep "db-backup-manager-svc" | awk '{print $1}'
Sample output:mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb
- Run the following command to get
the
db-backup-manager-svc
service name from cnDBTier cluster:$ kubectl -n <namespace of cnDBTier cluster> get svc | grep "db-backup-manager-svc" | awk '{print $1}'
For example:kubectl -n cluster1 get svc | grep "db-backup-manager-svc" | awk '{print $1}'
Sample output:mysql-cluster-db-backup-manager-svc
- Log in to the
db-backup-manager-svc
pod from cnDBTier cluster:$ kubectl -n <namespace of cnDBTier cluster> exec -it <db-backup-manager-svc pod> -- bash
For example:$ kubectl -n cluster1 exec -it mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb -- bash
- Run the following REST API CURL
command for creating the on-demand
backup:
$ curl -X POST http://<db-backup-manager-svc svc>:8080/db-tier/on-demand/backup/initiate
For example:$ curl -X POST http://mysql-cluster-db-backup-manager-svc:8080/db-tier/on-demand/backup/initiate
Sample output:{"backup_encryption_flag":"True","backup_id":"410231044","status":"BACKUP_INITIATED"}
Note down the latest
backup_id
in the output (for example, 216220937 in this case) which is required later for Network Database (NDB) cluster restoration. - Run the following command to get
the
- Perform the following to verify the status of the
on-demand backup:
- Run the following command to get
the
db-backup-manager-svc
pod name from cnDBTier cluster:$ kubectl -n <namespace of cnDBTier cluster> get pods | grep "db-backup-manager-svc" | awk '{print $1}'
For example:$ kubectl -n cluster1 get pods | grep "db-backup-manager-svc" | awk '{print $1}'
Sample output:mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb
- Run the following command to get
the
db-backup-manager-svc
service name from cnDBTier cluster:$ kubectl -n <namespace of cnDBTier cluster> get svc | grep "db-backup-manager-svc" | awk '{print $1}'
For example:kubectl -n cluster1 get svc | grep "db-backup-manager-svc" | awk '{print $1}'
Sample output:mysql-cluster-db-backup-manager-svc
- Log in to the
db-backup-manager-svc
pod:$ kubectl -n <namespace of cnDBTier cluster> exec -it <db-backup-manager-svc pod> -- bash
For example:$ kubectl -n cluster1 exec -it mysql-cluster-db-backup-manager-svc-b49488f8f-lbpbb -- bash
- Run the following cURL command to verify the status of the on-demand
backup:
Replace
BACKUP_ID_INITIATED
in the following command with the backup ID retrieved from step 2d.$ curl -X GET http://<db-backup-manager-svc svc>:8080/db-tier/on-demand/backup/<BACKUP_ID_INITIATED>/status
For example:$ curl -X GET http://mysql-cluster-db-backup-manager-svc:8080/db-tier/on-demand/backup/216220937/status
Sample output:{"backup_status":"BACKUP_COMPLETED", "transfer_status":"BACKUP_TRANSFER_REQUESTED"}
Note:
The"transfer_status"
parameter in the response payload indicates the backup transfer status fromdb-backup-manager-svc
todb-replication-svc
and not to the remote server.
- Run the following command to get
the
7.4 Restoring Georeplication (GR) Failure
This section provides the procedures to restore georeplication failures in cnDBTier clusters using cnDBTier fault recovery APIs or CNC Console.
- Georeplication failure between healthy clusters when binlog entries (database commits) are not replicated between clusters due to network outage or latency.
- Georeplication failure between the clusters when one or more clusters have fatal error and needs to be reinstalled.
Note:
- All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.
- You cannot restore empty databases.
- cnDBTier Cluster1, the first cluster in two-cluster, three-cluster, or four-cluster georeplication setup.
- cnDBTier Cluster2, the second cluster in two-cluster, three-cluster, or four-cluster georeplication setup.
- cnDBTier Cluster3, the third cluster in three-cluster or four-cluster georeplication setup.
- cnDBTier Cluster4, the fourth cluster in four-cluster georeplication setup.
- Bastion Host, the host that is used for installing cnDBTier clusters where kubectl and helm are installed and configured to access the kubernetes cluster and helm repository.
Georeplication Channels Between cnDBTier Clusters
The following table shows the sample cnDBTier cluster names, and the corresponding namespace and login credentials. Before starting the fault recovery procedures, update the following table with the actual values for all the cnDBTier clusters to refer while recovering the failed cnDBTier clusters.
Table 7-1 Cluster Details
cnDBTier Cluster | Cluster Name | Namespace | MySQL Root Password |
---|---|---|---|
cnDBTier cluster1 | Cluster1 | Cluster1 | SamplePassword |
cnDBTier cluster2 | Cluster2 | Cluster2 | SamplePassword |
cnDBTier cluster3 | Cluster3 | Cluster3 | SamplePassword |
cnDBTier cluster4 | Cluster4 | Cluster4 | SamplePassword |
7.4.1 Restoring GR Failures Using cnDBTier GRR APIs
This chapter describes the procedures to restore Georeplication (GR) failures using cnDBTier Georeplication Recovery (GRR) APIs. For more information about the GRR APIs, see Fault Recovery APIs.
7.4.1.1 Two-Cluster Georeplication Failure
This section provides the procedures to recover georeplication failure in two-cluster georeplication deployments.
7.4.1.1.1 Resolving GR Failure Between cnDBTier Clusters in a Two-Cluster Replication
This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a two-cluster replication using cnDBTier georeplication recovery APIs.
- The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to other cnDBTier cluster or has some fatal errors which require to be reinstalled.
- All cnDBTier clusters (cnDBTier cluster1 and cnDBTier cluster2) are in a healthy state, that is, all database nodes (including management node, data node, and API node) are in a Running state if their is only a replication delay and no fatal errors exist in the cnDBTier cluster which needs to be restored.
- NF or application traffic is diverted from the failed cnDBTier Cluster to the working cnDBTier Cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure, restore cnDBTier Cluster to the latest DB backup using automated backup and restore and then re-establish the replication channels.
- Check the status of the cnDBTier cluster status in both cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1 and cnDBTier cluster2.
- Run the following commands to mark
gr_state
of the failed cnDBTier cluster as FAILED in cnDBTier clusters that are accessible and are in one of the healthy cnDBTier clusters:- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer Port for the cluster that must be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in the failed cnDBTier
cluster if it is accessible. http response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
- Mark the cnDBTier cluster as failed in one of the other healthy
remote cnDBTier cluster:
- Get the replication service LoadBalancer IP for the
healthy
cluster:
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer port of the
healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in one of the
healthy cnDBTier clusters. HTTP response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of the failed cluster>/failed
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
- Get the replication service LoadBalancer IP for the
healthy
cluster:
For example:- If cnDBTier cluster1 is failed, mark the cnDBTier cluster as
failed in the unhealthy cnDBTier cluster cluster1:
- Get the replication service LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the cnDBTier
cluster cluster1 as failed in unhealthy cnDBTier cluster
cluster1 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Mark the cnDBTier cluster1 as failed in one of the
other healthy remote cnDBTier cluster2:
- Get the replication service LoadBalancer IP
for the healthy
cluster.
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port for the healthy
cluster:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark cluster1
as failed in the healthy cnDBTier cluster cluster2. http
response code 200 indicates that the cluster's gr_state
is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Get the replication service LoadBalancer IP
for the healthy
cluster.
- Get the replication service LoadBalancer IP of
cluster1:
- If cnDBTier cluster2 is failed, mark the cnDBTier cluster
as failed in the unhealthy cnDBTier cluster cluster2:
- Get the replication service LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the cnDBTier
cluster cluster2 as failed in unhealthy cnDBTier cluster
cluster2 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Mark the cnDBTier cluster2 as failed in one of the
other healthy remote cnDBTier cluster1:
- Get the replication service LoadBalancer IP
for the healthy
cluster:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port for the healthy
cluster:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark cluster2
as failed in the healthy cnDBTier cluster cluster2. http
response code 200 indicates that the cluster's gr_state
is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Get the replication service LoadBalancer IP
for the healthy
cluster:
- Get the replication service LoadBalancer IP of
cluster2:
- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
- Run the following commands for restoring the cnDBTier cluster depending
on the status of cnDBTier cluster:
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal errors,
or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster
to restore the database from remote cnDBTier Cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
For example:- If cnDBTier Cluster1 needs to be restored, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the remote cnDBTier Cluster2 by configuring the remote clusters replication service IP address of cnDBTier cluster2 in cnDBTier cluster1.
- If cnDBTier Cluster2 needs to be restored, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the remote cnDBTier Cluster1 by configuring the remote clusters replication service IP address of cnDBTier cluster1 in cnDBTier cluster2.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled cnDBTier.
For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If the cnDBTier cluster that needs to be restored is UP, does
not have any fatal errors, and if the georeplication has failed or a large
replication delay exists, then run the following commands to restore the
database from remote cnDBTier Cluster:
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start the GR procedure on
the failed cluster for restoring the cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of the failed cluster>/start
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.For example,- If cnDBTier cluster1 needs to be restored,
then run the following commands to start the
georeplication procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service
LoadBalancer IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the georeplication procedure on the failed
cluster(cluster1) for restoring the cluster from
remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
- Get the replication service
LoadBalancer IP for
cluster1:
- If cnDBTier cluster2 needs to be restored,
then run the following commands to start the
georeplication procedure on the failed cluster
(cluster2) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service
LoadBalancer IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the georeplication procedure on the failed cluster
(cluster2) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
- Get the replication service
LoadBalancer IP for
cluster2:
- If cnDBTier cluster1 needs to be restored,
then run the following commands to start the
georeplication procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
- If cnDBTier cluster which needs to be restored has fatal errors,
or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster
to restore the database from remote cnDBTier Cluster.
- Wait until the georeplication recovery is complete. You can check the
cluster status using Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring
the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
7.4.1.2 Three-Cluster Georeplication Failure
This section provides the procedures to recover georeplication failure in three-cluster georeplication deployments.
7.4.1.2.1 Resolving GR Failure Between cnDBTier Clusters in a Three-Cluster Replication
This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a three-cluster replication using cnDBTier georeplication recovery APIs.
- The failed cnDBTier cluster is having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which require to be reinstalled.
- cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster which needs to be restored.
- Other two cnDBTier clusters (that is, first working cnDBTier cluster and second working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
- NF or application traffic is diverted from the failed cnDBTier Cluster to any of the working cnDBTier Cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in one cluster, restore cnDBTier failed cluster to the latest DB backup using automated backup and restore, and then reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Procedure:
- Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
- Run the following commands to mark
gr_state
of the failed cnDBTier cluster as FAILED in cnDBTier clusters that are accessible and in one of the healthy cnDBTier cluster:- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer Port for the cluster that must be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in the failed cnDBTier
cluster if it is accessible. http response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
- Mark the cnDBTier cluster as failed in one of the other
healthy remote cnDBTier cluster:
- Get the replication service LoadBalancer IP of the
healthy
cluster.
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer port of
the healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in one of the
healthy cnDBTier clusters. http response code 200 indicates that
the cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
- Get the replication service LoadBalancer IP of the
healthy
cluster.
For example:- If cnDBTier cluster1 is failed then perform the
following:
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster1:
- Get the replication service LoadBalancer
IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the
cnDBTier cluster cluster1 as failed if it is
accessible. http response code 200 indicates that
the cluster's gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service LoadBalancer
IP of
cluster1:
- Mark the unhealthy cnDBTier cluster as failed
in one of the healthy cnDBTier cluster:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster2.
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster2.
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster1:
- If cnDBTier cluster2 is failed, then perform the
following:
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster2:
- Get the replication service LoadBalancer
IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the
cnDBTier cluster cluster2 as failed if it is
accessible. http response code 200 indicates that
the cluster's gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service LoadBalancer
IP of
cluster2:
- Mark the unhealthy cnDBTier cluster as failed
in one of the healthy cnDBTier cluster:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster2:
- If cnDBTier cluster3 is failed, then perform the
following:
- mark the cnDBTier cluster as failed in unhealthy
cnDBTier cluster if it is accessible:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the
cnDBTier cluster cluster3 as failed in the unhealthy
cnDBTier cluster if it is accessible. http response
code 200 indicates that the cluster's gr_state is
marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster3:
- Mark the unhealthy cnDBTier cluster3 as failed
in one of the healthy cnDBTier cluster:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
- mark the cnDBTier cluster as failed in unhealthy
cnDBTier cluster if it is accessible:
- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
- Run the following commands for restoring the cnDBTier cluster
depending on the status of cnDBTier cluster.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier Cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
For example:- If cnDBTier Cluster 1 needs to be restored, then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
- If cnDBTier Cluster 2 needs to be restored, then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
- If cnDBTier Cluster 3 needs to be restored, then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier cluster that needs to be restored is UP, and
does not have any fatal errors, then run the following commands to
restore the database from remote cnDBTier Clusters:
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start the GR procedure
on the failed cluster for restoring the cluster from remote
cnDBTier
Cluster:
or$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.For example,- If cnDBTier cluster1 needs to be
restored, then run the following commands to start
the georeplication procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service
LoadBalancer IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the georeplication procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier
cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
- Get the replication service
LoadBalancer IP for
cluster1:
- If cnDBTier cluster2 needs to be
restored, then run the following commands to start
the georeplication procedure on the failed cluster
(cluster2) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service
LoadBalancer IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the georeplication procedure on the failed cluster
(cluster2) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
- Get the replication service
LoadBalancer IP for
cluster2:
- If cnDBTier cluster3 needs to be
restored, then run the following commands to start
the georeplication procedure on the failed cluster
(cluster3) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service
LoadBalancer IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the georeplication procedure on the failed
cluster (cluster3) for restoring the cluster from
remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
- Get the replication service
LoadBalancer IP for
cluster3:
- If cnDBTier cluster1 needs to be
restored, then run the following commands to start
the georeplication procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier Cluster:
- Get the replication service LoadBalancer IP of the
cluster where georeplication needs to be
restored:
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier Cluster.
- Wait until the georeplication recovery is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.- Perform the Verifying Cluster Status Using cnDBTier procedure to check the cluster status.
- When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
7.4.1.2.2 Restoring Two cnDBTier Clusters in a Three-Cluster Replication
This section describes the procedure to restore two cnDBTier clusters having fatal errors in a three-cluster replication using cnDBTier georeplication recovery APIs.
- First failed cnDBTier Cluster and second failed cnDBTier cluster have some fatal errors or georeplication between them is failed and which require to be restored.
- working cnDBTier Cluster is in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state.
- Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored.
- NF or application traffic is diverted from the first failed cnDBTier cluster and second failed cnDBTier Cluster to the working cnDBTier Cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in two cnDBTier clusters (that is, first failed cnDBTier cluster and second failed cnDBTier cluster), restore failed cnDBTier Clusters to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2 and cnDBTier cluster3.
- Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
- Run the following commands to mark the gr_state of failed cnDBTier
clusters (that is, first failed cnDBTier cluster and second failed cnDBTier
cluster) as FAILED in cnDBTier clusters in all unhealthy cnDBTier
clusters and in one of the healthy cnDBTier cluster:
- Run the following command to get the replication service
LoadBalancer IP from one of the healthy
cluster:
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer Port from one of the healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the unhealthy cnDBTier cluster as failed in one of the
healthy cnDBTier clusters. http response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
- If failed cnDBTier clusters are accessible, mark the
unhealthy cnDBTier clusters as failed:
- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed.
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer port for
the cluster which must be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in cnDBTier
clusters if it is accessible. http response code 200 indicates
that the cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed.
For example:- If the failed cnDBTier clusters are cluster2 and
cluster3, then perform the following:
- Mark the failed cnDBTier clusters cluster2 and
cluster3 as failed in the healthy cnDBTier cluster
cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the failed cnDBTier clusters cluster2 and
cluster3 as failed in the healthy cnDBTier cluster
cluster1:
- If the failed cnDBTier clusters cluster2 and cluster3
are accessible, then perform the following:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster2:
- Get the replication service LoadBalancer
IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster2 is failed, mark the
cnDBTier cluster2 as failed in cnDBTier clusters if
it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service LoadBalancer
IP of
cluster2:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster3:
- Get the replication service
LoadBalancer IP for
cluster3.
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is the failed
cluster, then mark the cnDBTier cluster3 as failed
in cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP for
cluster3.
- Mark the cnDBTier cluster as failed in cnDBTier
cluster2:
- If failed cnDBTier clusters are cluster1 and cluster3,
then perform the following:
- Mark the failed cnDBTier clusters cluster1 and
cluster3 as failed in the healthy cnDBTier cluster
cluster2:
- Get the replication service
LoadBalancer IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Get the replication service
LoadBalancer IP for
cluster2:
- Mark the failed cnDBTier clusters cluster1 and
cluster3 as failed in the healthy cnDBTier cluster
cluster2:
- If cnDBTier cluster1 and cluster3 has replication
delay, perform the following:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster1:
- Get the replication service LoadBalancer
IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service LoadBalancer
IP for
cluster1:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster3:
- Get the replication service LoadBalancer
IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is failed, mark
the cnDBTier cluster3 as failed in unhealthy
cnDBTier cluster cluster3 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service LoadBalancer
IP for
cluster3:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster1:
- If failed cnDBTier clusters are cluster1 and cluster2,
then perform the following:
- Mark the failed cnDBTier clusters cluster1 and
cluster2 as failed in the healthy cnDBTier cluster
cluster3:
- Get the replication service
LoadBalancer IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Get the replication service
LoadBalancer IP for
cluster3:
- Mark the failed cnDBTier clusters cluster1 and
cluster2 as failed in the healthy cnDBTier cluster
cluster3:
- If cnDBTier cluster1 and cluster2 has replication delay,
perform the following:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster1:
- Get the replication service LoadBalancer
IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service LoadBalancer
IP for
cluster1:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster2:
- Get the replication service LoadBalancer
IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster2 is failed, mark the
cnDBTier cluster2 as failed in unhealthy cnDBTier
cluster cluster2 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service LoadBalancer
IP for
cluster2:
- Mark the cnDBTier cluster as failed in cnDBTier
cluster1:
- Run the following command to get the replication service
LoadBalancer IP from one of the healthy
cluster:
- Uninstall first failed cnDBTier cluster if first failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Run the following commands to reinstall first failed cnDBTier
cluster and wait till database is restored from the remote cluster is
completed.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If the cnDBTier cluster that needs restoration contains
fatal errors, or if the cnDBTier cluster status is DOWN, then reinstall
the cnDBTier cluster to restore the database from the remote cnDBTier
cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
For example:- If the first failed cnDBTier Cluster is cnDBTier cluster 1, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
- If the first failed cnDBTier Cluster is cnDBTier cluster 2, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
- If the first failed cnDBTier Cluster is cnDBTier cluster 3, then uninstall cnDBtier cluster3 and reinstall cnDBTier cluster3 using the above procedures which restores the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier cluster which needs to be restored is UP, and
does not have any fatal errors, then perform the following:
- Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Run the following commands to restore the database
from remote cnDBTier Clusters:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
PORT of the cluster where georeplication needs to be
restored:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start the GR
procedure on the failed cluster for restoring the
cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
or
Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
For example:- If cnDBTier cluster1 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster1) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed cluster
(cluster1) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
- Get the replication service
LoadBalancer IP of
cluster1:
- If cnDBTier cluster2 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster2) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the GR procedure on the failed cluster (cluster2)
for restoring the cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
- Get the replication service
LoadBalancer IP of
cluster2:
- If cnDBTier cluster3 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster3) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed cluster
(cluster3) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
- Get the replication service
LoadBalancer IP of
cluster3:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
- If the cnDBTier cluster that needs restoration contains
fatal errors, or if the cnDBTier cluster status is DOWN, then reinstall
the cnDBTier cluster to restore the database from the remote cnDBTier
cluster.
- Wait until database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is completed.
Check the cluster status using Verifying Cluster Status Using cnDBTier procedure after the cnDBTier cluster is UP and continue monitoring the georeplication recovery status.
Follow Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure to monitor the georeplication recovery status.
- Run the following commands to reinstall second failed cnDBTier
cluster and wait till database is restored from the remote cluster is
completed.
Follow the step a, if the cnDBTier cluster has fatal errors (like PVC corruption, PVC not accessible, other fatal errors, and so on.).
Follow the step b and c, if the cnDBTier cluster does not have any fatal errors and restore of database is needed because of the georeplication failure.
- If first failed cnDBTier cluster which needs to be restored
has fatal errors or if first failed cnDBTier cluster status is DOWN then
reinstall the first failed cnDBTier cluster to restore the database from
remote cnDBTier Clusters.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
For example:- If second failed cnDBTier Cluster is cnDBTier cluster 1, then uninstall cnDBtier cluster1 and install cnDBTier cluster1 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
- If second failed cnDBTier Cluster is cnDBTier cluster 2, then uninstall cnDBtier cluster2 and install cnDBTier cluster2 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
- If second failed cnDBTier Cluster is cnDBTier cluster 3, then uninstall cnDBtier cluster3 and install cnDBTier cluster3 using the above procedures which will restore the database from the working cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If the second failed cnDBTier cluster that needs to be
restored is UP, and does not have any fatal errors, then perform the
following:
- Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Run the following commands to restore the database
from remote cnDBTier Clusters:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
PORT of the cluster where georeplication needs to be
restored:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start the GR
procedure on the failed cluster for restoring the
cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
or
Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
For example:- If cnDBTier cluster1 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster1) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the GR procedure on the failed cluster (cluster1)
for restoring the cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
- Get the replication service
LoadBalancer IP of
cluster1:
- If cnDBTier cluster2 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster2) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the GR procedure for restoring the cluster on the
failed cluster (cluster2) from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
- Get the replication service
LoadBalancer IP of
cluster2:
- If cnDBTier cluster3 needs to be
restored, run the following commands to start the GR
procedure on the failed cluster (cluster3) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed cluster
(cluster3) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
- Get the replication service
LoadBalancer IP of
cluster3:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
- If first failed cnDBTier cluster which needs to be restored
has fatal errors or if first failed cnDBTier cluster status is DOWN then
reinstall the first failed cnDBTier cluster to restore the database from
remote cnDBTier Clusters.
- Wait until the georeplication recovery is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.- Perform the Verifying Cluster Status Using cnDBTier procedure to check the cluster status.
- When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
7.4.1.3 Four-Cluster Georeplication Failure
This section provides the procedures to recover georeplication failure in four-cluster georeplication deployments.
7.4.1.3.1 Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication
This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a four-cluster replication using cnDBTier georeplication recovery APIs.
- The failed cnDBTier cluster is having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which require to be reinstalled.
- cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster that must be restored.
- Other three cnDBTier clusters (that is, first working cnDBTier cluster, second working cnDBTier cluster and third working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
- NF or application traffic is diverted from Failed cnDBTier Cluster to any of the working cnDBTier Cluster and third working cnDBTier cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in a single failed cnDBTier cluster, restore the failed cnDBTier Cluster to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Procedure:
- Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Run the following commands to mark the gr_state of failed cnDBTier
cluster as FAILED in any working cnDBTier cluster and also failed
cnDBTier cluster, if it is accessible:
- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer Port for the cluster that must be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in the failed cnDBTier
cluster if it is accessible. http response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
- Mark the unhealthy cnDBTier cluster as failed in one of the
healthy clusters:
- Get the replication service LoadBalancer IP for the
healthy
cluster:
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer port of the
healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in one of the
healthy cnDBTier clusters. http response code 200 indicates that
the cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
Note:
For more information about georeplication recovery API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
- Get the replication service LoadBalancer IP for the
healthy
cluster:
For example:- If cnDBTier cluster1 is the failed cluster, then perform
the following:
- mark the cnDBTier cluster1 as failed in cnDBTier
cluster cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark the
cnDBTier cluster cluster1 as failed. http response
code 200 indicates that the cluster's gr_state is
marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the unhealthy cnDBTier cluster1 as failed
in one of the healthy cluster cnDBTier cluster2:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster2.
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster2.
- mark the cnDBTier cluster1 as failed in cnDBTier
cluster cluster1:
- If cnDBTier cluster2 is the failed cluster, then perform
the following:
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster2:
- Get the replication service LoadBalancer
IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster2 is unhealthy, run
the following command to mark the cnDBTier cluster
cluster2 as failed if it is accessible. http
response code 200 indicates that the cluster's
gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service LoadBalancer
IP of
cluster2:
- Mark the unhealthy cnDBTier cluster as failed in
one of the healthy cnDBTier clusters:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster2 as failed in the healthy cnDBTier cluster
cluster1:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
- mark the cnDBTier cluster as failed in cnDBTier
cluster cluster2:
- If cnDBTier cluster3 is the failed cluster, then
perform the following:
- mark the cnDBTier cluster cluster3 as failed in
the unhealthy cnDBTier clusters if it is accessible:
- Get the replication service LoadBalancer
IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is unhealthy, run
the following command to mark the cnDBTier cluster
cluster3 as failed if it is accessible. http
response code 200 indicates that the cluster's
gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service LoadBalancer
IP of
cluster3:
- Mark the unhealthy cnDBTier cluster3 as failed
in one of the healthy cnDBTier clusters:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster3 as failed in healthy cnDBTier cluster
cluster1:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
- mark the cnDBTier cluster cluster3 as failed in
the unhealthy cnDBTier clusters if it is accessible:
- If cnDBTier cluster4 is unhealthy, then perform the
following:
- mark cluster4 as failed in the unhealthy
cnDBTier cluster clusterr4 if it is accessible:
- Get the replication service
LoadBalancer IP of
cluste43:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster4 is unhealthy, run
the following command to mark the cnDBTier cluster
cluster4 as failed if it is accessible. http
response code 200 indicates that the cluster's
gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP of
cluste43:
- Mark the unhealthy cnDBTier cluster4 as failed
in one of the healthy cnDBTier clusters:
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to mark
cluster4 as failed in healthy cnDBTier cluster
cluster1:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- Get the loadbalancer IP of healthy
cnDBTier cluster
cluster1.
- mark cluster4 as failed in the unhealthy
cnDBTier cluster clusterr4 if it is accessible:
- Run the following command to get the replication service
LoadBalancer IP for the cluster that must be marked as
failed:
- Run the following commands for restoring the cnDBTier cluster
depending on the status of cnDBTier cluster.
Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier Cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier Cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier Cluster for restoring the database.
For example:- If cnDBTier Cluster 1 needs to be restored then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
- If cnDBTier Cluster 2 needs to be restored then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
- If cnDBTier Cluster 3 needs to be restored then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
- If cnDBTier Cluster 4 needs to be restored then uninstall and install cnDBTier cluster4 using the above procedures which will restore the database from the remote cnDBTier Clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster4.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier cluster which needs to be restored is UP, and
does not have any fatal errors, then perform the following:
- Configure the remote site IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Run the following commands to restore the database
from remote cnDBTier Clusters:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
PORT of the cluster where georeplication needs to be
restored:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start the GR
procedure on the failed cluster for restoring the
cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/start
or
Run the following command to start the GR procedure on the failed cluster for restoring the cluster by selecting the backup cluster name where backup is taken:$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/grbackupsite/<cluster name where backup is initiated>/start
For example:- If cnDBTier cluster1 needs to be
restored, run the following commands to start the
GR procedure on the failed cluster (cluster1) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to start
the GR procedure on the failed cluster (cluster1)
for restoring the cluster from remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/start
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
- Get the replication service
LoadBalancer IP of
cluster1:
- If cnDBTier cluster2 needs to be
restored, run the following commands to start the
GR procedure on the failed cluster (cluster2) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed cluster
(cluster2) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/start
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
- Get the replication service
LoadBalancer IP of
cluster2:
- If cnDBTier cluster3 needs to be
restored, run the following commands to start the
GR procedure on the failed cluster (cluster3) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed
cluster(cluster3) for restoring the cluster from
remote cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/start
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
- Get the replication service
LoadBalancer IP of
cluster3:
- If cnDBTier cluster4 needs to be
restored, run the following commands to start the
GR procedure on the failed cluster (cluster4) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service
LoadBalancer IP of
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to
start the GR procedure on the failed cluster
(cluster4) for restoring the cluster from remote
cnDBTier
Cluster:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/start
Sample output:Defaulted container "mysqlndbcluster" out of: mysqlndbcluster, init-sidecar HTTP/1.1 200 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Frame-Options: DENY Content-Length: 0 Date: Tue, 07 Nov 2023 12:33:00 GMT
- Get the replication service
LoadBalancer IP of
cluster4:
- If cnDBTier cluster1 needs to be
restored, run the following commands to start the
GR procedure on the failed cluster (cluster1) for
restoring the cluster from remote cnDBTier
Cluster:
- Get the replication service LoadBalancer IP
of the cluster where georeplication needs to be
restored:
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier Cluster.
- Wait until the georeplication recovery is completed. You can check
the cluster status using Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue
monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs
procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
7.4.1.3.2 Restoring Two cnDBTier Clusters in a Four-Cluster Replication
This section describes the procedure to reinstall two cnDBTier clusters having fatal error in a four-cluster replication using cnDBTier georeplication recovery APIs.
- The first failed cnDBTier cluster and the second failed cnDBTier cluster are having a replication delay impacting the NF functionality, with respect to other cnDBTier Clusters or has some fatal errors which require to be reinstalled.
- cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in running state if there is only a replication delay and no fatal errors exist in the cnDBTier cluster that needs to be restored.
- Other two cnDBTier clusters (that is, first working cnDBTier cluster and second working cnDBTier cluster) are in a healthy state, that is, all database nodes (including management node, data node, and api node) are in Running state, and the replication channels between them are running correctly.
- Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster will be restored initially and then second cnDBTier cluster will be restored.
- NF or application traffic is diverted from Failed cnDBTier Cluster to any of the working cnDBTier Cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in a first failed cnDBTier cluster and second failed cnDBTier cluster, restore failed cnDBTiers to the latest DB backup using automated backup and restore. Reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3 and cnDBTier cluster4.
Procedure:
- Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Run the following commands to mark the gr_state of failed cnDBTier
cluster and second failed cnDBTier cluster as FAILED in any of the
healthy cnDBTier cluster and in all unhealthy cnDBTier clusters that are
accessible:
- Run the following command to get the replication service
LoadBalancer IP from one of the healthy
cluster:
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer Port from one of the healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the unhealthy cnDBTier cluster as failed in one of the
healthy cnDBTier clusters. http response code 200 indicates that the
cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
- Mark the unhealthy cnDBTier clusters as failed in the
unhealthy cnDBTier cluster if it is accessible.:
- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer port for
the cluster which must be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in cnDBTier
clusters that is accessible. http response code 200 indicates
that the cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
Note:
For more information about georeplication recovery API responses and error codes, see Fault Recovery APIs.
- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed:
For example:- If cnDBTier cluster1 and cluster2 have fatal errors or
replication delay:
- Mark the failed cnDBTier clusters cluster1 and
cluster2 as failed in healthy cnDBTier cluster cluster3:
- Get the replication service LoadBalancer
IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- If cnDBTier cluster1 and cluster2 have
replication delay:
- Mark cluster1 as failed in
cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark cluster2 as failed in
cnDBTier cluster2:
- Get the replication service
LoadBalancer IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster2 is the
failed cluster, mark the cnDBTier cluster2 as
failed in cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP for
cluster2:
- Mark cluster1 as failed in
cnDBTier cluster1:
- Get the replication service LoadBalancer
IP of
cluster3:
- Mark the failed cnDBTier clusters cluster1 and
cluster3 as failed in healthy cnDBTier cluster cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If cnDBTier cluster1 and cluster3 have
replication delay:
- Mark cnDBTier cluster as failed
in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark cnDBTier cluster as failed
in cnDBTier cluster3:
- Get the replication service
LoadBalancer IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is the
failed cluster, mark the cnDBTier cluster3 as
failed in cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP for
cluster3:
- Mark cnDBTier cluster as failed
in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the failed cnDBTier clusters cluster1 and
cluster4 as failed in healthy cnDBTier cluster cluster2:
- Get the replication service LoadBalancer
IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster1 and cluster4 have
replication delay:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the cnDBTier cluster as
failed in cluster4:
- Get the replication service
LoadBalancer IP for
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster4 is the failed
cluster, mark the cnDBTier cluster4 as failed in
cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP for
cluster4:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service LoadBalancer
IP of
cluster2:
- Mark the failed cnDBTier clusters cluster2 and
cluster3 as failed in healthy cnDBTier cluster cluster1:
- Get the replication service LoadBalancer
IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If cnDBTier cluster2 and cluster3 have
replication delay:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cluster2 is the failed cluster,
mark the cnDBTier cluster2 as failed in the
cnDBTier clusters that are
accessible
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster3:
- Get the replication service
LoadBalancer IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is the
failed cluster, mark the cnDBTier cluster3 as
failed in cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP for
cluster3:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service LoadBalancer
IP of
cluster1:
- Mark the failed cnDBTier clusters cluster2 and
cluster4 as failed in healthy cnDBTier cluster cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster2 and cluster4 have
replication delay:
- Mark the cluster as failed in
cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster2 is the failed
cluster, mark the cnDBTier cluster2 as failed in
cluster2 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the cluster as failed in
cnDBTier cluster4:
- Get the replication service
LoadBalancer IP for
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster4 is the
failed cluster, mark the cnDBTier cluster4 as
failed in cluster4 if it is
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP for
cluster4:
- Mark the cluster as failed in
cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the failed cnDBTier clusters cluster3 and
cluster4 as failed in healthy cnDBTier cluster cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer
Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster3 and cluster4 have
replication delay:
- Mark the cluster as failed in
cnDBTier cluster3:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster3 is the
failed cluster, mark the cnDBTier cluster3 as
failed in cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster3:
- Mark the cluster as failed in
cnDBTier cluster4:
- Get the replication service
LoadBalancer IP for
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port for
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- If cnDBTier cluster4 is the failed
cluster, mark the cnDBTier cluster4 as failed in
cnDBTier clusters that are
accessible:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP for
cluster4:
- Mark the cluster as failed in
cnDBTier cluster3:
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the failed cnDBTier clusters cluster1 and
cluster2 as failed in healthy cnDBTier cluster cluster3:
- Run the following command to get the replication service
LoadBalancer IP from one of the healthy
cluster:
- Uninstall first failed cnDBTier cluster if first failed cnDBTier
cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall second failed cnDBTier cluster if second failed cnDBTier
cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Restore the first failed cnDBTier cluster and wait until database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the first failed cnDBTier Cluster.
- Wait until database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.You can check the cluster status first failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery Status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
- Reinstall the second failed cnDBTier cluster and wait until database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the second failed cnDBTier Cluster.
- Wait until database is restored in the second failed cnDBTier
cluster and georeplication recovery procedure is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the cluster status second failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
7.4.1.3.3 Restoring Three cnDBTier Clusters in a Four-Cluster Replication
This section describes the procedure to reinstall three cnDBTier clusters having fatal error in a four-cluster replication using cnDBTier georeplication recovery APIs.
- The first failed, second failed, and third failed cnDBTier clusters are having the replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has some fatal errors which requires to be reinstalled.
- cnDBTier clusters are in a healthy state, that is, all database nodes (including management node, data node and api node) are in Running state if their is only a replication delay and no fatal errors exist in the cnDBTier cluster that needs to be restored.
- Other working cnDBTier clusters (that is, first working cnDBTier cluster) are in healthy state, that is, all database nodes (including management node, data node and api node) are in Running state, and the replication channels between them are running correctly.
- Only one cnDBTier cluster is restored at any time. For example: first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored and finally third failed cnDBTier cluster is restored.
- NF or application traffic is diverted from Failed cnDBTier Clusters (First Failed cnDBTier cluster, Second Failed cnDBTier cluster and Third Failed cnDBTier cluster) to working cnDBTier Cluster.
- CURL is installed in the environment from where commands are run.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this Georeplication failure in the first, second and third failed cnDBTier clusters, restore failed cnDBTiers to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3 and cnDBTier cluster4.
Procedure
- Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying Cluster Status Using cnDBTier to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Run the following commands to mark the gr_state of the first failed,
second failed, and third failed cnDBTier cluster as FAILED in working
cnDBTier cluster in the unhealthy cnDBTier clusters if accessible:
- If failed cnDBTier clusters needs to be reinstalled, then
update
gr_state
of the failed cnDBTier clusters in one of the healthy cnDBTier cluster if unhealthy clusters are not accessible and have fatal errors:- Get the replication service LoadBalancer IP of the
healthy
cluster:
$ export IP=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer Port of
the healthy
cluster:
$ export PORT=$(kubectl get svc -n <namespace of healthy cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in cnDBTier
clusters that are healthy. http response code 200 indicates that
the cluster's gr_state is marked as failed
successfully.
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/<name of failed cluster>/failed
- Get the replication service LoadBalancer IP of the
healthy
cluster:
- If failed cnDBTier clusters are not reinstalled, then update
gr_state
in the failed cnDBTier clusters if they are accessible and does not have any fatal errors:- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed:
$ export IP=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service LoadBalancer Port for
the cluster which should be marked as
failed:
$ export PORT=$(kubectl get svc -n <namespace of failed cluster> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster as failed in cnDBTier
clusters which ever are accessible. http response code 200
indicates that the cluster's gr_state is marked as failed
successfully:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/<name of failed cluster>/failed
- Get the replication service LoadBalancer IP for the
cluster which must be marked as
failed:
For example:- If cnDBTier cluster2, cluster3 and cluster4 have fatal
errors or replication delay:
- mark the failed cnDBTier clusters cluster2,
cluster3 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- Get the replication service LoadBalancer
IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster2, cluster3, and
cluster4 have replication delay:
- mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster3:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster3:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster4:
- Get the replication service
LoadBalancer IP of
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP of
cluster4:
- mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service LoadBalancer
IP of
cluster1:
- mark the failed cnDBTier clusters cluster2,
cluster3 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- If cnDBTier cluster1, cluster3 and cluster4 have fatal
errors or replication delay:
- mark the failed cnDBTier clusters cluster1,
cluster3 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- Get the replication service LoadBalancer
IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster1, cluster3, and
cluster4 have replication delay:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster3:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster3:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster4:
- Get the replication service
LoadBalancer IP of
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP of
cluster4:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service LoadBalancer
IP of
cluster2:
- mark the failed cnDBTier clusters cluster1,
cluster3 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- If cnDBTier cluster1, cluster2 and cluster4 have fatal
errors or replication delay:
- mark the failed cnDBTier clusters cluster1,
cluster2 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster4/failed
- If cnDBTier cluster1, cluster2, and
cluster4 have replication delay:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster4:
- Get the replication service
LoadBalancer IP of
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster4 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/failed
- Get the replication service
LoadBalancer IP of
cluster4:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster3:
- mark the failed cnDBTier clusters cluster1,
cluster2 and cluster4 as failed in healthy cnDBTier cluster
cluster1:
- If cnDBTier cluster1, cluster2 and cluster3 have fatal
errors or replication delay:
- mark the failed cnDBTier clusters cluster1,
cluster2 and cluster3 as failed in healthy cnDBTier cluster
cluster1:
- Get the replication service
LoadBalancer IP of
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster1/failed
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster2/failed
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/remotesite/cluster3/failed
- If cnDBTier cluster1, cluster2, and
cluster3 have replication delay:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster1 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/failed
- Get the replication service
LoadBalancer IP of
cluster1:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster2:
- Get the replication service
LoadBalancer IP of
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster2 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/failed
- Get the replication service
LoadBalancer IP of
cluster2:
- Mark the cnDBTier cluster as
failed in cnDBTier cluster3:
- Get the replication service
LoadBalancer IP of
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Get the replication service
LoadBalancer Port of
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Mark the cnDBTier cluster3 as
failed:
$ curl -i -X POST http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/failed
- Get the replication service
LoadBalancer IP of
cluster3:
- mark the cnDBTier cluster as
failed in cnDBTier cluster1:
- Get the replication service
LoadBalancer IP of
cluster4:
- mark the failed cnDBTier clusters cluster1,
cluster2 and cluster3 as failed in healthy cnDBTier cluster
cluster1:
Note:
For more information about georeplication recovery API responses and error codes, see Fault Recovery APIs. - If failed cnDBTier clusters needs to be reinstalled, then
update
- Uninstall first failed cnDBTier cluster if first failed cnDBTier
cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall second failed cnDBTier cluster if second failed cnDBTier
cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Uninstall third failed cnDBTier cluster if second failed cnDBTier
cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Restore the first failed cnDBTier cluster and wait until the database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the first failed cnDBTier Cluster.
- Wait until the database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the cluster status first failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery Status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
- Reinstall the second failed cnDBTier cluster and wait until the database is restored from the remote cnDBTier cluster is completed. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in a Four-Cluster Replication for restoring the second failed cnDBTier Cluster.
- Wait until the database is restored in the second failed cnDBTier
cluster and georeplication recovery procedure is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.You can check the cluster status of the second failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
- Reinstall the third failed cnDBTier cluster and wait until the database is restored from the remote cluster is completed. Follow Step 3 of Georeplication Failure between cnDBTier Clusters in Four Site Replication for restoring the third failed cnDBTier Cluster.
- Wait until the database is restored in the third failed cnDBTier
cluster and georeplication recovery procedure is completed.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the status of third failed cnDBTier cluster using the Verifying Cluster Status Using cnDBTier procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using cnDBTier APIs procedure.
7.4.1.4 All Cluster Georeplication Failure
This section describes the procedure to resolve georeplication failures on all clusters.
Procedure:
- Uninstall all the failed cnDBTier clusters one after the other.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the failed cnDBTier clusters.
- Reinstall the standalone cnDBTier cluster. For installation procedure, see Oracle Communications Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide.
- Restore the DB in the standalone cnDBTier cluster from the backup taken from the other cluster. For procedure, see Restoring Database from Backup with ndb_restore.
- Add mate clusters one cluster after the other. For cluster addition procedures, see the "Adding Georedundant cnDBTier Site" section in Oracle Communications Cloud Native Core, cnDBTier User Guide.
7.4.2 Restoring Georeplication Failures Using CNC Console
This chapter describes the procedures to restore Georeplication (GR) failures using CNC Console.
7.4.2.1 Two-Cluster Georeplication Failure
This section provides the recovery scenarios in two-cluster georeplication deployments.
7.4.2.1.1 Resolving GR Failure Between cnDBTier Clusters in Two-Cluster Replication using CNC Console
This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a two-cluster replication using CNC Console.
- The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to other cnDBTier cluster or has some fatal errors for which the cluster needs to be reinstalled.
- If there is only a replication delay and there are no fatal
errors in the cnDBTier cluster that needs to be restored, then all the
cnDBTier clusters are in healthy state, that is, all database nodes
(including management node, data node, and api node) are in the
Running
state. - NF or application traffic is diverted from the failed cnDBTier cluster to the working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure, restore cnDBTier cluster to the latest DB backup using automated backup and restore and then re-establish the replication channels.
- Check the status of both cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of cnDBTier cluster1 and cnDBTier cluster2.
- Perform the following steps to update the cluster status as FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down, select
the cluster that you want to mark as failed and click Update
Cluster.
The drop-down displays the clusters that are part of the replication setup (cluster1 and cluster2). When you select a cluster and click Update Cluster, the selected cluster name is updated in the Failed Cluster Names field.
For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:Figure 7-1 Update Cluster As Failed
- Perform this step to restore the cnDBTier cluster depending on the
status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal errors,
or if cnDBTier cluster status is DOWN, then reinstall the cnDBTier cluster
to restore the database from remote cnDBTier cluster.
Follow Reinstalling cnDBTier Cluster to install the cnDBTier cluster by configuring the remote cluster IP address of the replication service of the remote cnDBTier cluster for restoring the database.
For example:- If cnDBTier cluster1 needs to be restored, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the remote cnDBTier cluster2 by configuring the remote cluster replication service IP address of cnDBTier cluster2 in cnDBTier cluster1.
- If cnDBTier cluster2 needs to be restored, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the remote cnDBTier cluster1 by configuring the remote cluster replication service IP address of cnDBTier cluster1 in cnDBTier cluster2.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled cnDBTier.
For sample procedure, see Creating NF Users.
Note:
Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If the cnDBTier cluster does not have any fatal errors, then
perform the following steps to start the georeplication recovery in the
cnDBTier cluster:
- On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Start Georeplication Recovery. The
following image shows a sample Start Georeplication Recovery
page:
Figure 7-2 Start Georeplication Recovery
- From the Failed Cluster Name drop-down, select the failed cluster that you want to restore.
- [Optional]: From the Backup Cluster Name
(Optional) drop-down, select one of the healthy
clusters from which the system can use the backup to restore the
failed cluster. If this option is not selected, the system uses the
backup from the first available healthy cluster.
For example, if cluster1 failed and needs to be restored, then the system displays the other healthy cluster (cluster2) to pick the backup. If no option is selected, the system uses the available healthy cluster by default.
- Click Start Georeplication Recovery.
- Wait until the georeplication recovery is complete. Use the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the status of the cluster. When the cnDBTier
cluster is UP, continue monitoring the georeplication recovery status by following
the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
7.4.2.2 Three-Cluster Georeplication Failure
This section describes the recovery scenarios in three-cluster georeplication deployments.
7.4.2.2.1 Resolving GR Failure Between cnDBTier Clusters in Three-Cluster Replication using CNC Console
This section describes the procedure to resolve a georeplication failure between cnDBTier clusters in a three-cluster replication.
- The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to the other cnDBTier clusters or has have some fatal errors for which the cluster needs to be reinstalled.
- If there is only a replication delay and there are no fatal errors in the cnDBTier cluster that needs to be restored, then all the cnDBTier clusters are in healthy state, that is, all database nodes (including management node, data node, and api node) are in the Running state.
- Apart from the failed cluster, the other two cnDBTier clusters (that is, the first working cnDBTier cluster and second working cnDBTier cluster) are in healthy state. That is, all database nodes (including management node, data node, and api node) are in the Running state, and the replication channels between them run correctly.
- NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in one cluster, restore cnDBTier failed cluster to the latest DB backup using automated backup and restore, and then reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
Procedure:
- Check the status of all the three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
- Perform the following steps to update the cluster status as FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down,
select the cluster that you want to mark as failed and click
Update Cluster.
The drop-down displays the clusters that are part of the replication setup (cluster1, cluster2, and cluster3). When you select a cluster and click Update Cluster, the selected cluster name is updated in the Failed Cluster Names field.
For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:
Figure 7-3 Update Cluster As Failed
- Perform this step to restore the cnDBTier cluster depending on the
status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
For example:- If cnDBTier cluster1 needs to be restored, then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
- If cnDBTier cluster2 needs to be restored, then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
- If cnDBTier cluster3 needs to be restored, then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If the cnDBTier cluster does not have any fatal errors,
then perform the following steps to start the georeplication recovery in
the cnDBTier cluster:
- On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Start Georeplication Recovery. The
following image shows a sample Start Georeplication Recovery
page:
Figure 7-4 Start Georeplication Recovery
- From the Failed Cluster Name drop-down,
select the failed cluster that you want to restore.
For example, if you want to restore cluster1, then select cluster1.
- [Optional]: From the Backup Cluster Name
(Optional) drop-down, select one of the healthy
clusters from which the system can use the backup to restore the
failed cluster. If this option is not selected, the system uses
the backup from the first available healthy cluster.
For example, if cluster1 failed and needs to be restored, then the system displays the other two healthy clusters (cluster2 and cluster3) to pick the backup. Select one cluster from the available healthy clusters to use the backup. Otherwise, the system uses the backup from the first available healthy cluster by default.
- Click Start Georeplication Recovery.
- Wait until the georeplication recovery is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.- Perform the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the cluster status.
- When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using CNC Console procedure.
7.4.2.2.2 Restoring Two cnDBTier Clusters in Three-Cluster Replication using CNC Console
This section describes the procedure to restore two cnDBTier clusters having fatal error in a three-cluster replication setup.
- The first failed cnDBTier cluster and second failed cnDBTier cluster have some fatal errors or georeplication between them failed and the cluster needs to be restored.
- The working cnDBTier cluster is in healthy state. That is, all database nodes (including management node, data node, and api node) are in the Running state.
- Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then second cnDBTier cluster is restored.
- NF or application traffic is diverted from the first failed cnDBTier cluster and second failed cnDBTier cluster to the working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in two cnDBTier clusters (that is, first failed cnDBTier cluster and second failed cnDBTier cluster), restore failed cnDBTier clusters to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2 and cnDBTier cluster3.
- Check the status of the cnDBTier cluster status in all the three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, and cnDBTier cluster3.
- Perform the following steps to update the cluster status as FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down,
select the cluster that you want to mark as failed and click
Update Cluster.
The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, and cluster3). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
For example, if you want to mark cluster1 and cluster2 as failed, then perform the following steps:- Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
- Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays both cluster1 and cluster2 in the Failed Cluster Names field.
Figure 7-5 Update Cluster1 and Cluster2 As Failed
- Uninstall the first failed cnDBTier cluster if the first failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall the second failed cnDBTier cluster if second failed cnDBTier cluster has fatal errors. Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Perform this step to reinstall the first failed cnDBTier cluster
depending on the status of the cnDBTier cluster and wait until the database
restore from the remote cluster is complete.
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If the cnDBTier cluster that needs restoration contains
fatal errors, or if the cnDBTier cluster status is DOWN, then reinstall
the cnDBTier cluster to restore the database from the remote cnDBTier
cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
For example:- If the first failed cnDBTier cluster is cnDBTier cluster1, then uninstall cnDBtier cluster1 and reinstall cnDBTier cluster1 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
- If the first failed cnDBTier cluster is cnDBTier cluster2, then uninstall cnDBtier cluster2 and reinstall cnDBTier cluster2 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
- If the first failed cnDBTier cluster is cnDBTier cluster3, then uninstall cnDBtier cluster3 and reinstall cnDBTier cluster3 using the above procedures which restores the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier cluster which needs to be restored is UP, and
does not have any fatal errors, then perform the following:
- Configure the remote cluster IPs(Step 2a) and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Perform the following steps to start the
georeplication recovery in the cnDBTier cluster:
- On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Start Georeplication
Recovery. The following image shows a
sample Start Georeplication Recovery page:
Figure 7-6 Start Georeplication Recovery
- From the Failed Cluster
Name drop-down, select the failed
cluster that you want to restore.
For example, if cluster1 is the first failed cluster that needs to be restored, then select cluster1.
- [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
- Click Start Georeplication Recovery.
- Wait until database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is completed.
Check the cluster status using Verifying cnDBTier Cluster Status Using CNC Console procedure after the cnDBTier cluster is UP and continue monitoring the georeplication recovery status.
Follow Monitoring Georeplication Recovery Status Using CNC Console procedure to monitor the georeplication recovery status.
- Perform this step to reinstall the second failed cnDBTier cluster
depending on the status of the cnDBTier cluster and wait until the database
restore from the remote cluster is complete.
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If the second failed cnDBTier cluster which needs to be
restored has fatal errors or if first failed cnDBTier cluster status is
DOWN then reinstall the first failed cnDBTier cluster to restore the
database from remote cnDBTier clusters.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
For example:- If second failed cnDBTier cluster is cnDBTier cluster1, then uninstall cnDBtier cluster1 and install cnDBTier cluster1 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster1.
- If second failed cnDBTier cluster is cnDBTier cluster2, then uninstall cnDBtier cluster2 and install cnDBTier cluster2 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster2.
- If second failed cnDBTier cluster is cnDBTier cluster3, then uninstall cnDBtier cluster3 and install cnDBTier cluster3 using the above procedures which will restore the database from the working cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster3
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If the second failed cnDBTier cluster that needs to be
restored is UP, and does not have any fatal errors, then perform the
following:
- Configure the remote cluster IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Perform the following steps to start the
georeplication recovery in the cnDBTier cluster:
- On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Start Georeplication
Recovery. The following image shows a
sample Start Georeplication Recovery page:
Figure 7-7 Start Georeplication Recovery
- From the Failed Cluster
Name drop-down, select the failed
cluster that you want to restore.
For example, if cluster2 is the second failed cluster that needs to be restored, then select cluster2.
- [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
- Click Start Georeplication Recovery.
- Wait until the georeplication recovery is complete. Verify and
monitor the cluster status by performing the following steps:
- Perform the Verifying cnDBTier Cluster Status Using CNC Console procedure to check the cluster status.
- When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by performing the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
7.4.2.3 Four-Cluster Georeplication Failure
This section describes the recovery scenarios in four-cluster georeplication deployments.
7.4.2.3.1 Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console
This section describes the procedure to resolve georeplication failure between cnDBTier clusters in four-cluster replication.
- The failed cnDBTier cluster has a replication delay impacting the NF functionality, with respect to the other cnDBTier clusters or has fatal errors for which the cluster needs to be reinstalled.
- If there is only a replication delay and there are no fatal
errors in the cnDBTier cluster that needs to be restored, then all the
cnDBTier clusters are in healthy state, that is, all database nodes
(including management node, data node, and API node) are in the
Running
state. - Apart from the failed cluster, the other three cnDBTier
clusters (that is, first working cnDBTier cluster, second working cnDBTier
cluster, and third working cnDBTier cluster) are in a healthy state. That
is, all database nodes (including management node, data node, and API node)
are in the
Running
state, and the replication channels between them are running correctly. - NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster and third working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in a single failed cnDBTier cluster, restore the failed cnDBTier cluster to the latest DB backup using automated backup and restore. Re-establish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Procedure:
- Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Perform the following steps to update the cluster status as
FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down,
select the cluster that you want to mark as failed and click
Update Cluster.
The drop-down displays the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). The selected cluster name is updated in the Failed Cluster Names field. For example, if you selected cluster1 to be marked as failed, then on clicking Update Cluster, cluster1 is updated in the Failed Cluster Names field.
The following image shows a sample Update Cluster As Failed page where cluster1 is marked as a failed cluster:Figure 7-8 Update Cluster As Failed
- Perform this step to restore the cnDBTier cluster depending on the
status of the cnDBTier cluster:
- Follow step a, if the cnDBTier cluster has fatal errors such as PVC corruption, PVC not accessible, and other fatal errors.
- Follow step b and c, if the cnDBTier cluster does not have any fatal errors and database restore is needed because of the georeplication failure.
- If cnDBTier cluster which needs to be restored has fatal
errors, or if cnDBTier cluster status is DOWN, then reinstall the
cnDBTier cluster to restore the database from remote cnDBTier cluster.
Follow Reinstalling cnDBTier Cluster for installing the cnDBTier cluster by configuring the remote cluster IP address of the replication service of remote cnDBTier cluster for restoring the database.
For example:- If cnDBTier cluster1 needs to be restored then uninstall and reinstall cnDBTier cluster1 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster2 and cnDBTier cluster3 in cnDBTier cluster1.
- If cnDBTier cluster 2 needs to be restored then uninstall and reinstall cnDBTier cluster2 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster3 in cnDBTier cluster2.
- If cnDBTier cluster 3 needs to be restored then uninstall and install cnDBTier cluster3 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of cnDBTier cluster1 and cnDBTier cluster2 in cnDBTier cluster3.
- If cnDBTier cluster 4 needs to be restored then uninstall and install cnDBTier cluster4 using the above procedures which will restore the database from the remote cnDBTier clusters by configuring the remote clusters replication service IP address of working cnDBTier clusters in cnDBTier cluster4.
- Create the required NF-specific user accounts and grants to
match NF users and grants of the good cluster in the reinstalled
cnDBTier. For sample procedure, see Creating NF Users.
Note:
Verify and create the NF users accounts and grants in the all the cnDBTier clusters before initiating the georeplication recover to ensure that all the NF users exits on all the cnDBTier clusters. For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier cluster which needs to be restored is UP, and
does not have any fatal errors, then perform the following:
- Configure the remote cluster IPs and then upgrade the cnDBTier cluster by following the Upgrading cnDBTier procedures.
- Perform the following steps to start the
georeplication recovery in the cnDBTier cluster:
- On CNC Console, expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Start Georeplication
Recovery. The following image shows a
sample Start Georeplication Recovery page:
Figure 7-9 Start Georeplication Recovery
- From the Failed Cluster
Name drop-down, select the failed
cluster that you want to restore.
For example, if cluster1 is the failed cluster that needs to be restored, then select cluster1.
- [Optional]: From the Backup Cluster Name (Optional) drop-down, select one of the healthy clusters from which the system can use the backup to restore the failed cluster. If this option is not selected, the system uses the backup from the first available healthy cluster.
- Click Start Georeplication Recovery.
- Wait until the georeplication recovery is completed. You can check
the cluster status using Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue
monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.
7.4.2.3.2 Restoring Two cnDBTier Clusters in Four-Cluster Replication using CNC Console
This section describes the procedure to reinstall two cnDBTier clusters having fatal error in a four-cluster replication.
- The first failed and second failed cnDBTier clusters are having a replication delay impacting the NF functionality, with respect to other cnDBTier clusters or has fatal errors for which the cluster needs to be reinstalled.
- If there is only a replication delay and there are no fatal
errors in the cnDBTier cluster that needs to be restored, then all the
cnDBTier clusters are in healthy state, that is, all database nodes
(including management node, data node, and API node) are in the
Running
state. - Apart from the failed clusters, the other two cnDBTier clusters
(that is, first working cnDBTier cluster and second working cnDBTier
cluster) are in a healthy state. That is, all database nodes (including
management node, data node, and API node) are in the
Running
state, and the replication channels between them are running correctly. - Only one cnDBTier cluster is restored at any time. For example, first failed cnDBTier cluster is restored initially and then the second cnDBTier cluster is restored.
- NF or application traffic is diverted from the failed cnDBTier cluster to any of the working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this georeplication failure in a first failed cnDBTier cluster and second failed cnDBTier cluster, restore failed cnDBTiers to the latest DB backup using automated backup and restore. Reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3 and cnDBTier cluster4.
Procedure:
- Check the status of the cnDBTier cluster status in four cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Perform the following steps to update the cluster status as
FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down,
select the cluster that you want to mark as failed and click
Update Cluster.
The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
For example, if you want to mark cluster1 and cluster2 as failed, then perform the following steps:- Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
- Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays both cluster1 and cluster2 in the Failed Cluster Names field.
Figure 7-10 Update Cluster1 and Cluster2 As Failed
- Uninstall the first failed cnDBTier cluster if the first failed
cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall the second failed cnDBTier cluster if the second failed
cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Restore the first failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the first failed cnDBTier cluster.
- Wait until database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the cluster status of the first failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery S\status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
- Reinstall the second failed cnDBTier cluster and wait until database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the second failed cnDBTier cluster.
- Wait until database is restored in the second failed cnDBTier
cluster and georeplication recovery procedure is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the cluster status of the second failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
7.4.2.3.3 Restoring Three cnDBTier Clusters in Four-Cluster Replication using CNC Console
This section describes the procedure to reinstall three cnDBTier clusters having fatal error in a four-cluster replication.
- The first, second, and third failed cnDBTier clusters have replication delay impacting the NF functionality, with respect to other cnDBTier clusters or have fatal errors for which the clusters need to be reinstalled.
- If there is only replication delay and there are no fatal errors in
the cnDBTier clusters that need to be restored, then all cnDBTier clusters are
in healthy state. That is, all database nodes (including management node, data
node and API node) are in the
Running
state. - Apart from the failed clusters, the other working cnDBTier clusters (that is, first working cnDBTier cluster) is in healthy state. That is, all database nodes (including management node, data node and api node) are in Running state, and the replication channels between them are running correctly.
- Only one cnDBTier cluster is restored at any time. For example, the first failed cnDBTier cluster is restored initially, then the second cnDBTier cluster is restored, and finally the third failed cnDBTier cluster is restored.
- NF or application traffic is diverted from the failed cnDBTier clusters (First failed cnDBTier cluster, second sailed cnDBTier cluster, and third failed cnDBTier cluster) to the working cnDBTier cluster.
- If cnDBTier cluster fails while enabling encryption or changing the encryption secret in any cluster, then before starting the georeplication recovery, ensure that either encryption is disabled on all the clusters or encryption is enabled with the same encryption key across all accessible cnDBTier clusters.
Note:
All the georeplication cnDBTier clusters must be on the same cnDBTier version to restore from georeplication failures. However, you can perform georeplication recovery in a cnDBTier cluster with a higher version, using the backup that is from a cnDBTier cluster with a lower version.To resolve this Georeplication failure in the first, second and third failed cnDBTier clusters, restore the failed cnDBTiers to the latest DB backup using the automated backup and restore. Reestablish the replication channels between cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
Procedure
- Check the status of the cnDBTier cluster status in three cnDBTier clusters. Follow Verifying cnDBTier Cluster Status Using CNC Console to check the status of the cnDBTier cluster1, cnDBTier cluster2, cnDBTier cluster3, and cnDBTier cluster4.
- Perform the following steps to update the cluster status as
FAILED:
- Log in to CNC Console GUI of the active cluster.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Select Update Cluster As Failed.
- From the Cluster Names drop-down,
select the cluster that you want to mark as failed and click
Update Cluster.
The Cluster Names drop-down displays all the clusters that are part of the replication setup (cluster1, cluster2, cluster3, and cluster4). When you select a cluster and click Update Cluster, the selected cluster name gets updated in the Failed Cluster Names field.
For example, if you want to mark cluster1, cluster2, and cluster 3 as failed, then perform the following steps:- Select cluster1 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 in the Failed Cluster Names field.
- Select cluster2 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1 and cluster2 in the Failed Cluster Names field.
- Select cluster3 from the Cluster Names drop-down and click Update Cluster. The system displays cluster1, cluster2, and cluster3 in the Failed Cluster Names field.
Figure 7-11 Update Cluster1, Cluster2, and Cluster3 As Failed
- Uninstall the first failed cnDBTier cluster if the first failed
cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the first failed cnDBTier cluster.
- Uninstall the second failed cnDBTier cluster if the second failed
cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Uninstall the third failed cnDBTier cluster if the second failed
cnDBTier cluster has fatal errors.
Follow Uninstalling cnDBTier Cluster procedure for uninstalling the second failed cnDBTier cluster.
- Reinstall the first failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the first failed cnDBTier cluster.
- Wait until the database is restored in the first failed cnDBTier
cluster and georeplication recovery procedure is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the cluster status first failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
- Reinstall the second failed cnDBTier cluster and wait until the database restore from the remote cnDBTier cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the second failed cnDBTier cluster.
- Wait until the database is restored in the second failed cnDBTier
cluster and georeplication recovery procedure is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.You can check the cluster status of the second failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
- Reinstall the third failed cnDBTier cluster and wait until the database restore from the remote cluster is complete. Follow Step 3 of Resolving GR Failure Between cnDBTier Clusters in Four-Cluster Replication using CNC Console to restore the third failed cnDBTier cluster.
- Wait until the database is restored in the third failed cnDBTier
cluster and georeplication recovery procedure is complete.
Note:
The backup-manager-service pod may restart a couple of times during the georeplication recovery. You can ignore these restarts, however, ensure that the backup-manager-service pod is up at the end of the georeplication recovery.Check the status of third failed cnDBTier cluster using the Verifying cnDBTier Cluster Status Using CNC Console procedure. When the cnDBTier cluster is UP, continue monitoring the georeplication recovery status by following the Monitoring Georeplication Recovery Status Using CNC Console procedure.
7.4.3 Recovering Georeplication Sites Using dbtrecover
The dbtrecover script automates the georeplication recovery procedures and increases the
probability of successfully completing the recovery process by simplifying the
procedures, reducing human errors, and running a limited number of system tests and
connectivity tests. This section provides information about recovering georeplication
sites using the dbtrecover
bash script.
Note:
The script can be run on all cnDBTier versions starting 22.1.x.dbtrecover
is an interactive script. It runs the georeplication recovery
procedures on one site at a time. It collects the information about the system from the
site the script is run from and promts you to provide details about the site that needs
to be recovered. Before prompting, the script displays the system information. You must
review the system information before providing the recovery site details and proceeding
further.
dbtrecover
synchronizes the database between sites and
recovers georeplication sites by automatically running the georeplication recovery
procedures. For more information about georeplication recovery procedures, see Restoring Georeplication (GR) Failure.
In addition, the script tests the cnDBTier and georeplication systems that are required for a successful georeplication recovery. These tests can be expanded as per your feedback in the future releases.
7.4.3.1 Prerequisites
- The
dbtrecover
script must be properly installed. For more information, see Installing and Using the dbtrecover Script. - The script must be run only from the good site. For information about selecting the good site, see Selecting the Good Site.
- The deployment log level (
LOG_LEVEL
) of db-backup-manager must be set toINFO
and notDEBUG
.
Note:
You can recover only one site at a time using thedbtrecover
script. If multiple sites need to be recovered,
you must rerun the script to recover the sites one after the other.
cnDBTier Version Considerations
dbtrecover
script supports georeplication recovery between sites running
different cnDBTier versions. However, the following conditions must be considered before
performing a georeplication recovery using the script:
- The script supports georeplication recovery between sites only in the following
cases:
- The good site and bad site (the site to be recovered) runs the same cnDBTier version.
- The good site runs a lower cnDBTier version. However, the cnDBTier releases on both the good site and the bad site supports the same database replication REST API.
- The script doesn't support georeplication recovery between sites in the following
cases:
- The good site runs a cnDBTier version that is higher than the version in the bad site.
- The good site runs a lower cnDBTier version and uses a different database replication REST API from the one used by the bad site.
- cnDBTier versions below 22.4.2
- cnDBTier version 23.1.0
- cnDBTier versions greater than or equal to 22.4.2 and less than 23.1.0
- cnDBTier version greater than or equal 23.1.1
For more information about database replication REST APIs, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
- While running the
dbtrecover
script, select the Reinstall (that is, the bad site has a fatal error) option. - Install the new version on the bad site. When the new release is installed, cnDBTier completes the recovery automatically.
Note:
Ensure that you do not reinstall the same version (lower version) on the bad site. This causes the georeplication recovery to fail.7.4.3.2 Selecting the Good Site
dbtrecover
script must be run only from the good site. The
script uses the good site to take the database backup for performing a georeplication recovery
on the failed site. The following steps detail the method in which the script selects the good
site:
- The script scans through the site IDs from 1 through 4 one by one and prompts you to state if that site requires a recovery. That is, the script first chooses site 1 and prompts you to state if site 1 requires a recovery.
- If site 1 doesn't require a recovery, then site 1 is selected as the good
site. That is,
dbtrecover
must be run on the namespace and Bastion of site 1. - If site 1 requires a recovery, then the system moves to site 2 and prompts you to state if site 2 requires a recovery.
- This process continues until the good site is identified.
Note:
Ifdbtrecover
is
run on a wrong site, the script terminates with an error. It also displays the identified
good site and directs you to run the script on the good site.
7.4.3.3 Installing and Using the dbtrecover Script
This section provides details about installing and using the
dbtrecover
script to perform an automated georeplication recovery.
Installing the dbtrecover Script
cd Artifacts/Scripts/tools
source ./source_me
The system prompts you to enter the namespace and uses the same to set the
DBTIER_NAMESPACE. It also sets the DBTIER_LIB
environment variable with the
path to the directory containing the libraries needed by dbtrecover.
Using the dbtrecover Script
dbtrecover [-h | --help]
dbtrecover [OPTIONS]
7.4.3.4 Phases of dbtrecover
The dbtrecover script performs the georeplication recovery in phases. This section provides information about the different phases involved in dbtrecover.
- Phase 0: Collecting Site Information
The script collects information about the current site on which it is running and all the other mate sites that the current "good" site knows about.
- Phase 1: Running Sanity Checks
The script performs sanity checks to ensure that the georeplication recovery runs successfully.
- Phase 2: Setting Up Georeplication Recovery
The script sets
gr_state
toFAILED
and deletesmysql.ndb_apply_status
records. - Phase 3: Starting Georeplication Recovery
The script starts the georeplication recovery for the first site that needs to be recovered. For example, if two sites (site 1 and site 2) are to be recovered, then the script starts the recovery for site 1. In this phase, the script sets
gr_state
toSTARTDRRECOVER
or prompts you to reinstall. - Phase 4: Starting Georeplication Recovery Monitoring
The script queries the
replication_info.DBTIER_SITE_INFO
table on the site that is being recovered every second and updates the status of recovery ongr_state
.Note:
When thedbtrecover
script is run with the--start-from-monitoring
option, the script starts the georecovery recovery from this phase. - Phase 5: Run Post-Processing Steps
The script restarts the
mysqld
pods if the version is less than 22.3.3.
7.4.3.5 Configuration Options
Table 7-2 Configuration Options
Option | Usage | Example |
---|---|---|
-h | --help | This option is used to print the help message and exit. |
|
-v | --version | This option is used to print script's version. |
|
--debug | This option is used to output DEBUG log message to standard error, stderr. |
|
--no-colors | This option is used to print the output in the default terminal font color instead of using the dbtrecover colors. |
|
--start-from-monitoring | This option is used to skip phase 1, phase 2 and phase 3 and start
directly from phase 4 for the first site that is to be recovered. For information
about the different phases involved in running dbtrecover, see Phases of dbtrecover.
Note: The |
|
--skip-repl-restart | This option is used to skip restarting the local and remote db-replication-svc main containers. By default, the script restarts these containers. |
|
--use-ipv4 | This option is used to direct the dbtrecover script to use IPv4. |
|
--use-ipv6 | This option is used to direct the dbtrecover script to use IPv6. |
|
--skip-namespace-test | This option is used to skip testing that the namespace in DBTIER_NAMESPACE exists in the current cluster. |
|
--skip-tests | This option is used to skip sanity tests. |
|
--tests-only | This option is used to run only the sanity tests. |
|
--connect-timeout=<connect_timeout_in_seconds> | This option is used to specify the time (in seconds) to wait before ending a connection attempt. This option is used by the dbtrecover script when running the curl, mysql, mysqladmin and ssh commands. The default timeout value is 15 seconds. |
|
Note:
Use thedbtrecover
--help
command for more examples on running the script.
7.4.3.6 Running dbtrecover Script in a Two-Site Setup
This section provides the steps to run the dbtrecover
script in a
two-site setup.
- Run the following commands to source the
dbtrecover
script file:$ cd /home/youruser/workspace/Artifacts/Scripts/tools $ source ./source_me
Note:
source_me
must be sourced only when your current directory contains thesource_me
file.- Enter the namespace when the system prompts you. The system uses this
namespace to set DBTIER_NAMESPACE. It also sets the
DBTIER_LIB
environment variable with the path to the directory containing the libraries needed by dbtrecover.
Sample output:Enter cndbtier namespace: dbtier-namespace-2 DBTIER_NAMESPACE = "dbtier-namespace-2" DBTIER_LIB=/home/youruser/workspace/Artifacts/Scripts/tools/lib Adding /home/youruser/workspace/Artifacts/Scripts/tools/bin to PATH PATH=/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.0.1.el7_9.x86_64/bin:/usr/dev_infra/platform/bin:/usr/dev_infra/generic/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/usr/local/ade/bin:/opt/gradle/gradle-6.8.3/bin:/opt/helm/helm-v3.5.2-linux-amd64/bin:/home/youruser/workspace/Artifacts/Scripts/tools/bin
- Run the following command to run the dbtrecover script without any
additional configurations:
Note:
You can pass additional parameters to the script to run the script as per your requirement. For information about the available configurations, see Configuration Options.$ dbtrecover
The script runs in phases to identify the good site, take a backup from the good site, and recover the failed site.
Sample output to show a successful completion of the script:dbtrecover 23.3.0.0.2 Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. 2023-10-13T09:08:40Z INFO - **************************************************************************************************** 2023-10-13T09:08:40Z INFO - BEGIN PHASE 0: Collecting Site information 2023-10-13T09:08:40Z INFO - **************************************************************************************************** 2023-10-13T09:08:40Z INFO - dbtstart_ts = 1697188120 2023-10-13T09:08:41Z INFO - Using IPv4: LOOPBACK_IP="127.0.0.1" 2023-10-13T09:08:41Z INFO - DBTIER_NAMESPACE = dbtier_namespace-2 2023-10-13T09:08:41Z INFO - DBTIER_NAMESPACE = dbtier_namespace-2 2023-10-13T09:08:41Z INFO - Testing namespace, dbtier_namespace-2, exists... 2023-10-13T09:08:41Z INFO - Should be able to see namespace, dbtier_namespace-2, with "kubectl get ns -o name dbtier_namespace-2" - PASSED 2023-10-13T09:08:41Z INFO - Namespace, dbtier_namespace-2, exists 2023-10-13T09:08:41Z INFO - Getting sts and sts pod info... 2023-10-13T09:08:41Z INFO - Getting MGM sts and sts pod info... 2023-10-13T09:08:42Z INFO - MGM_STS="my-prefix-ndbmgmd" 2023-10-13T09:08:42Z INFO - MGM_REPLICAS="2" 2023-10-13T09:08:42Z INFO - MGM_PODS: my-prefix-ndbmgmd-0 my-prefix-ndbmgmd-1 2023-10-13T09:08:42Z INFO - Getting NDB sts and sts pod info... 2023-10-13T09:08:42Z INFO - NDB_STS="my-prefix-ndbmtd" 2023-10-13T09:08:42Z INFO - NDB_REPLICAS="2" 2023-10-13T09:08:42Z INFO - NDB_PODS: my-prefix-ndbmtd-0 my-prefix-ndbmtd-1 2023-10-13T09:08:42Z INFO - Getting API sts and sts pod info... 2023-10-13T09:08:42Z INFO - API_STS="my-prefix-ndbmysqld" 2023-10-13T09:08:42Z INFO - API_REPLICAS="2" 2023-10-13T09:08:42Z INFO - API_PODS: my-prefix-ndbmysqld-0 my-prefix-ndbmysqld-1 2023-10-13T09:08:42Z INFO - Getting APP sts and sts pod info... 2023-10-13T09:08:42Z INFO - APP_STS="my-prefix-ndbappmysqld" 2023-10-13T09:08:42Z INFO - APP_REPLICAS="2" 2023-10-13T09:08:42Z INFO - APP_PODS: my-prefix-ndbappmysqld-0 my-prefix-ndbappmysqld-1 2023-10-13T09:08:42Z INFO - Getting deployment pod info... 2023-10-13T09:08:42Z INFO - grepping for backup-man (BAK_CHART_NAME)... 2023-10-13T09:08:42Z INFO - BAK_PODS: my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9 2023-10-13T09:08:42Z INFO - BAK_DEPLOY: my-prefix-db-backup-manager-svc 2023-10-13T09:08:42Z INFO - grepping for db-mon (MON_CHART_NAME)... 2023-10-13T09:08:42Z INFO - MON_PODS: my-prefix-db-monitor-svc-7c54bdc95c-xpbk4 2023-10-13T09:08:42Z INFO - MON_DEPLOY: my-prefix-db-monitor-svc 2023-10-13T09:08:42Z INFO - grepping for repl (REP_CHART_NAME)... 2023-10-13T09:08:43Z INFO - REP_PODS: my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t 2023-10-13T09:08:43Z INFO - REP_DEPLOY: my-prefix-lfg-site-2-lfg-site-1-replication-svc 2023-10-13T09:08:43Z INFO - is https enabled: false 2023-10-13T09:08:43Z INFO - Collecting current site information... 2023-10-13T09:08:43Z INFO - Collecting current site name and id... 2023-10-13T09:08:43Z INFO - CURRENT_SITE_NAME = lfg-site-2 2023-10-13T09:08:43Z INFO - CURRENT_SITE_ID = 2 2023-10-13T09:08:43Z INFO - MATE_SITE_DB_REPLICATION_PORT=80 2023-10-13T09:08:43Z INFO - FILE_TRANSFER_PORT_NUMBER=2022 2023-10-13T09:08:44Z INFO - REPLCHANNEL_GROUP_COUNT=1 2023-10-13T09:08:44Z INFO - Current site information collected 2023-10-13T09:08:44Z INFO - Collecting clusterinformation... NAME READY STATUS RESTARTS AGE my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9 1/1 Running 6 (27m ago) 10h my-prefix-db-monitor-svc-7c54bdc95c-xpbk4 1/1 Running 0 10h my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t 1/1 Running 9 (8m38s ago) 10h my-prefix-ndbappmysqld-0 2/2 Running 3 (27m ago) 10h my-prefix-ndbappmysqld-1 2/2 Running 3 (27m ago) 10h my-prefix-ndbmgmd-0 1/1 Running 0 10h my-prefix-ndbmgmd-1 1/1 Running 0 10h my-prefix-ndbmtd-0 2/2 Running 0 10h my-prefix-ndbmtd-1 2/2 Running 0 10h my-prefix-ndbmysqld-0 2/2 Running 3 (27m ago) 10h my-prefix-ndbmysqld-1 2/2 Running 3 (27m ago) 10h Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=1 @10.233.93.250 (mysql-8.0.31 ndb-8.0.31, Nodegroup: 0, *) id=2 @10.233.121.144 (mysql-8.0.31 ndb-8.0.31, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.121.87 (mysql-8.0.31 ndb-8.0.31) id=50 @10.233.118.121 (mysql-8.0.31 ndb-8.0.31) [mysqld(API)] 8 node(s) id=56 @10.233.118.205 (mysql-8.0.31 ndb-8.0.31) id=57 @10.233.110.113 (mysql-8.0.31 ndb-8.0.31) id=70 @10.233.115.118 (mysql-8.0.31 ndb-8.0.31) id=71 @10.233.102.154 (mysql-8.0.31 ndb-8.0.31) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host) 2023-10-13T09:08:44Z INFO - Number of sites: 2 2023-10-13T09:08:44Z INFO - SITE_NAME: Site 1: lfg-site-1 Site 2: lfg-site-2 2023-10-13T09:08:47Z INFO - SITE_DB_REPL_SVC[1] = SITE_1_DB_REPL_SVC 2023-10-13T09:08:47Z INFO - SITE_1_DB_REPL_SVC: 10.121.27.176 2023-10-13T09:08:47Z INFO - SITE_DB_REPL_SVC[2] = SITE_2_DB_REPL_SVC 2023-10-13T09:08:47Z INFO - SITE_2_DB_REPL_SVC: 10.121.27.177 2023-10-13T09:08:48Z INFO - SERVER_ID = (1000 1001 2000 2001) 2023-10-13T09:08:48Z INFO - SERVER_IPS[1000] = 10.121.27.163 2023-10-13T09:08:48Z INFO - SERVER_IPS[1001] = 10.121.27.164 2023-10-13T09:08:48Z INFO - SERVER_IPS[2000] = 10.121.27.170 2023-10-13T09:08:48Z INFO - SERVER_IPS[2001] = 10.121.27.171 2023-10-13T09:08:49Z INFO - CURRENT_SITE_DBTIER_VERSION=22.4.2 2023-10-13T09:08:49Z INFO - CURRENT_SITE_DBTIER_VERSION_FROM_HELM=22.4.2 2023-10-13T09:08:49Z INFO - Cluster information collected 2023-10-13T09:08:49Z INFO - name = lfg-site-1, server_id = 1000, host = 10.121.27.163 2023-10-13T09:08:49Z INFO - name = lfg-site-1, server_id = 1001, host = 10.121.27.164 2023-10-13T09:08:49Z INFO - name = lfg-site-2, server_id = 2000, host = 10.121.27.170 2023-10-13T09:08:49Z INFO - name = lfg-site-2, server_id = 2001, host = 10.121.27.171 2023-10-13T09:08:49Z INFO - lfg-site-1 replication services IPs: 10.121.27.176 2023-10-13T09:08:49Z INFO - lfg-site-2 replication services IPs: 10.121.27.177 2023-10-13T09:08:49Z INFO - dbtcinfo_ts = 1697188129 2023-10-13T09:08:49Z INFO - Collecting cluster info took: 00 hr. 00 min. 09 sec. Does lfg-site-1 need to be recovered (yes/no/exit)? yes Does lfg-site-1 also need to be reinstalled (i.e. does it have fatal errors?) (yes/no/exit)? no Does lfg-site-2 need to be recovered (yes/no/exit)? no 2023-10-13T09:08:57Z INFO - Recovering from lfg-site-2 (Site 2) 2023-10-13T09:08:57Z INFO - Recovering lfg-site-1 (Site 1) DO YOU WANT TO PROCEED (yes/no/exit)? yes ARE YOU SURE (yes/no/exit)? yes 2023-10-13T09:09:00Z INFO - Starting db-replication-svc restarts at 2023-10-13T09:09:00Z 1 mysql 512 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar 2023-10-13T09:09:00Z INFO - Sending SIGTERM to pod: my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t, container: lfg-site-2-lfg-site-1-replication-svc (1697188140) 2023-10-13T09:09:00Z INFO - TERM signal sent to my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t 2023-10-13T09:09:00Z INFO - Waiting for my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t container to restart... 1 mysql 0 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc 2023-10-13T09:09:48Z INFO - Waiting for condition: is_proc_1_running_in_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc 2023-10-13T09:09:48Z INFO - Condition occurred 2023-10-13T09:09:48Z INFO - Connected again after 48 seconds (1697188188 - 1697188140) 2023-10-13T09:09:48Z INFO - time process has been running: 0 secs 2023-10-13T09:09:48Z INFO - Restarted main container on my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t 2023-10-13T09:09:48Z INFO - Waiting for sshd to restart in my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t container... 13 mysql 0 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc 2023-10-13T09:09:48Z INFO - Waiting for condition: is_sshd_running_in_pod my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t lfg-site-2-lfg-site-1-replication-svc 2023-10-13T09:09:48Z INFO - Condition occurred 2023-10-13T09:09:49Z INFO - Connected again after 49 seconds (1697188189 - 1697188140) 2023-10-13T09:09:49Z INFO - time process has been running: 1 secs 2023-10-13T09:09:49Z INFO - Restarted sshd in main container on my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t 2023-10-13T09:09:49Z INFO - Killing main container in all remote REPL pods... 2023-10-13T09:09:49Z INFO - Restarting db-replication-svc containers in lfg-site-1 1 mysql 535 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar 2023-10-13T09:09:51Z INFO - Sending SIGTERM (1697188140) 2023-10-13T09:09:55Z INFO - TERM signal sent to db-replication-svc on 10.121.27.176 2023-10-13T09:09:55Z INFO - Main container in all mate REPL pods killed 2023-10-13T09:09:55Z INFO - Waiting for mate db-replication-svc containers to restart... 2023-10-13T09:09:55Z INFO - Waiting for db-replication-svc containers in lfg-site-1 2023-10-13T09:09:55Z INFO - Waiting for 10.121.27.176 container to restart... 1 mysql 64 java -jar /opt/db_replication_svc/occne-dbreplicationsvc.jar 2023-10-13T09:11:46Z INFO - Waiting for condition: is_proc_1_running_in 10.121.27.176 2023-10-13T09:11:46Z INFO - Condition occurred 2023-10-13T09:11:50Z INFO - Connected again after 121 seconds (1697188310 - 1697188189) 2023-10-13T09:11:50Z INFO - time process has been running: 68 2023-10-13T09:11:50Z INFO - Restarted main container on 10.121.27.176 2023-10-13T09:11:50Z INFO - Main container in all mate(s) REPL pods restarted 2023-10-13T09:11:50Z INFO - Waiting for db-replication-svc containers in lfg-site-1 to be READY 2023-10-13T09:11:50Z INFO - Waiting for 10.121.27.176 to be READY... 2023-10-13T09:11:51Z INFO - 10.121.27.176 is READY 2023-10-13T09:11:51Z INFO - Waiting for sshd to come up in 10.121.27.176... 13 mysql 71 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config 10.121.27.176 2023-10-13T09:11:53Z INFO - Waiting for condition: is_sshd_running_in 10.121.27.176 2023-10-13T09:11:53Z INFO - Condition occurred 2023-10-13T09:11:53Z INFO - sshd in 10.121.27.176 is up 2023-10-13T09:11:53Z INFO - Waiting for db-replication-svc containers in lfg-site-2 to be READY 2023-10-13T09:11:53Z INFO - Waiting for 10.121.27.177 to be READY... 2023-10-13T09:11:53Z INFO - 10.121.27.177 is READY 2023-10-13T09:11:53Z INFO - Waiting for sshd to come up in 10.121.27.177... 13 mysql 129 /usr/sbin/sshd -D -e -f /opt/ssh/sshd_config 10.121.27.177 2023-10-13T09:11:57Z INFO - Waiting for condition: is_sshd_running_in 10.121.27.177 2023-10-13T09:11:57Z INFO - Condition occurred 2023-10-13T09:11:57Z INFO - sshd in 10.121.27.177 is up 2023-10-13T09:11:57Z INFO - Ended db-replication-svc restarts at 2023-10-13T09:11:57Z 2023-10-13T09:11:57Z INFO - restarts took: 00 hr. 02 min. 57 sec. 2023-10-13T09:11:57Z INFO - **************************************************************************************************** 2023-10-13T09:11:57Z INFO - END PHASE 0: Collecting Site information 2023-10-13T09:11:57Z INFO - **************************************************************************************************** 2023-10-13T09:11:57Z INFO - dbtbtest_ts = 1697188317 2023-10-13T09:11:57Z INFO - **************************************************************************************************** 2023-10-13T09:11:57Z INFO - BEGIN PHASE 1: Run sanity checks 2023-10-13T09:11:57Z INFO - **************************************************************************************************** 2023-10-13T09:11:57Z INFO - Running sanity checks... 2023-10-13T09:11:57Z INFO - Testing remote site version... 2023-10-13T09:11:57Z INFO - Should be able to connect with HTTP from lfg-site-2 to lfg-site-1 (10.121.27.176:80) to get version (22.4.2) - PASSED 2023-10-13T09:11:57Z INFO - Version on lfg-site-1 (10.121.27.176:80) should be equal to version on lfg-site-2 (22.4.2, 22.4.2) - PASSED 2023-10-13T09:11:57Z INFO - Tests for remote site version finished 2023-10-13T09:11:57Z INFO - Testing database connectivity to remote NDB Cluster... 2023-10-13T09:11:58Z INFO - Should be able to connect from lfg-site-2 to lfg-site-1, 1000 (10.121.27.163) - PASSED 2023-10-13T09:11:59Z INFO - Should be able to connect from lfg-site-2 to lfg-site-1, 1001 (10.121.27.164) - PASSED 2023-10-13T09:11:59Z INFO - Tests for database connectivity to remote NDB Cluster finished 2023-10-13T09:11:59Z INFO - Testing HTTP connectivity to remote db-replication-svcs... 2023-10-13T09:11:59Z INFO - Should be able to connect with HTTP from lfg-site-2 to lfg-site-1 (10.121.27.176:80) to get status (UP) - PASSED 2023-10-13T09:11:59Z INFO - Tests for HTTP connectivity to remote db-replication-svcs finished 2023-10-13T09:11:59Z INFO - Testing SSH connectivity to remote db-replication-svcs... 2023-10-13T09:12:01Z INFO - Should be able to connect with SSH from lfg-site-2 to lfg-site-1 (10.121.27.176:2022) to get name (my-prefix-lfg-site-1-lfg-site-2-replication-svc-789d45d88bv6vmd) - PASSED 2023-10-13T09:12:01Z INFO - Tests for SSH connectivity to remote db-replication-svcs finished 2023-10-13T09:12:01Z INFO - Testing SSH connectivity to local db-replication-svcs from NDB pod... 2023-10-13T09:12:01Z INFO - Num of repl IPs in data pod's env variables should match those in DBTIER_REPL_SITE_INFO - PASSED 2023-10-13T09:12:04Z INFO - Should be able to connect with SSH from my-prefix-ndbmtd-0 to 10.121.27.177:2022 to get name (my-prefix-lfg-site-2-lfg-site-1-replication-svc-5466c5cd85nvb6t) - PASSED 2023-10-13T09:12:04Z INFO - Tests for SSH connectivity to local db-replication-svcs from NDB pod finished 2023-10-13T09:12:04Z INFO - Testing local db-backup-svc... 2023-10-13T09:12:14Z INFO - Backup manager (my-prefix-db-backup-manager-svc-6b96cb4567-xxbs9) should not be logging - PASSED 2023-10-13T09:12:24Z INFO - Backup executor (my-prefix-ndbmtd-0) should not be logging - PASSED 2023-10-13T09:12:24Z INFO - Backup executor (my-prefix-ndbmtd-1) should not be logging - PASSED 2023-10-13T09:12:24Z INFO - Tests for local db-backup-svc finished... 2023-10-13T09:12:24Z INFO - Sanity checks completed 2023-10-13T09:12:24Z INFO - **************************************************************************************************** 2023-10-13T09:12:24Z INFO - END PHASE 1: Run sanity checks 2023-10-13T09:12:24Z INFO - **************************************************************************************************** 2023-10-13T09:12:24Z INFO - dbtetest_ts = 1697188344 2023-10-13T09:12:24Z INFO - Sanity tests took: 00 hr. 00 min. 27 sec. 2023-10-13T09:12:24Z INFO - **************************************************************************************************** 2023-10-13T09:12:24Z INFO - BEGIN PHASE 2: Run GRR setup 2023-10-13T09:12:24Z INFO - **************************************************************************************************** 2023-10-13T09:12:24Z INFO - Setting STATE to FAILED in all sites... 2023-10-13T09:12:25Z INFO - lfg-site-1 (1000): STATE set to FAILED site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 FAILED NULL NULL lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:12:26Z INFO - lfg-site-2 (2000): STATE set to FAILED site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 FAILED NULL NULL lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:12:26Z INFO - STATE set to FAILED 2023-10-13T09:12:26Z INFO - Waiting 15 seconds for replicas to stop... 2023-10-13T09:12:41Z INFO - Waiting for condition: false 2023-10-13T09:12:41Z INFO - Waited for 15 seconds... 2023-10-13T09:12:41Z INFO - Delete from ndb_apply_status table in all sites... 2023-10-13T09:12:41Z INFO - lfg-site-1 (1000): deleted records 2023-10-13T09:12:42Z INFO - 2023-10-13T09:12:42Z INFO - lfg-site-2 (2000): deleted records 2023-10-13T09:12:43Z INFO - 2023-10-13T09:12:43Z INFO - Deleted from ndb_apply_status table in all sites 2023-10-13T09:12:43Z INFO - **************************************************************************************************** 2023-10-13T09:12:43Z INFO - END PHASE 2: Run GRR setup 2023-10-13T09:12:43Z INFO - **************************************************************************************************** 2023-10-13T09:12:43Z INFO - **************************************************************************************************** 2023-10-13T09:12:43Z INFO - BEGIN PHASE 3: Run GRR start 2023-10-13T09:12:43Z INFO - **************************************************************************************************** 2023-10-13T09:12:43Z INFO - Verifying replicas associated with lfg-site-1 have stopped 2023-10-13T09:12:44Z INFO - 1000: lfg-site-1, ndbmysqld-0 (56), 10.121.27.163: Replica has stopped 2023-10-13T09:12:45Z INFO - 1001: lfg-site-1, ndbmysqld-1 (57), 10.121.27.164: Replica has stopped 2023-10-13T09:12:46Z INFO - 2000: lfg-site-2, ndbmysqld-0 (56), 10.121.27.170: Replica has stopped 2023-10-13T09:12:47Z INFO - 2001: lfg-site-2, ndbmysqld-1 (57), 10.121.27.171: Replica has stopped 2023-10-13T09:12:47Z INFO - Replicas associated with lfg-site-1 have stopped (or been skipped if appropriate) 2023-10-13T09:12:47Z INFO - Recovering lfg-site-1... 2023-10-13T09:12:48Z INFO - GRR started for lfg-site-1 on 1000 2023-10-13T09:12:48Z INFO - Must use --start-from-monitoring if dbtrecover is interrupted 2023-10-13T09:12:48Z INFO - ANSWERS: dbtrecover --start-from-monitoring < <(dbtanswer "yes" "no" "no" "yes" "yes" ) 2023-10-13T09:12:48Z INFO - IMPORTANT: Please, make sure the directory where dbtrecover and dbtanswer are is in PATH 2023-10-13T09:12:48Z INFO - **************************************************************************************************** 2023-10-13T09:12:48Z INFO - END PHASE 3: Run GRR start 2023-10-13T09:12:48Z INFO - **************************************************************************************************** 2023-10-13T09:12:48Z INFO - **************************************************************************************************** 2023-10-13T09:12:48Z INFO - BEGIN PHASE 4: Start monitoring GRR 2023-10-13T09:12:48Z INFO - **************************************************************************************************** 2023-10-13T09:12:48Z INFO - Monitoring status... 2023-10-13T09:12:48Z INFO - STATE = STARTDRRESTORE 2023-10-13T09:12:49Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 STARTDRRESTORE NULL NULL lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:12:54Z INFO - STATE = REINSTALLED 2023-10-13T09:12:54Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 REINSTALLED NULL NULL lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:13:00Z INFO - STATE = INITIATEBACKUP 2023-10-13T09:13:00Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 INITIATEBACKUP NULL NULL lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:13:05Z INFO - STATE = CHECKBACKUP 2023-10-13T09:13:06Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 CHECKBACKUP lfg-site-2 1013230913 lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:13:07Z INFO - Droping databases with '-' (dash) in their name (10.121.27.163)... 2023-10-13T09:13:08Z INFO - Disabling error loging while deleting databases... 2023-10-13T09:13:08Z INFO - Re-enabling error loging... 2023-10-13T09:13:08Z INFO - Dropped databases 2023-10-13T09:13:59Z INFO - STATE = COPY_BACKUP 2023-10-13T09:14:00Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 COPY_BACKUP lfg-site-2 1013230913 lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:14:05Z INFO - STATE = CHECK_BACKUP_COPY 2023-10-13T09:14:05Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 CHECK_BACKUP_COPY lfg-site-2 1013230913 lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:15:01Z INFO - STATE = BACKUPCOPIED 2023-10-13T09:15:01Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 BACKUPCOPIED lfg-site-2 1013230913 lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:15:15Z INFO - STATE = BACKUPEXTRACTED 2023-10-13T09:15:15Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 BACKUPEXTRACTED lfg-site-2 1013230913 lfg-site-2 2 COMPLETED NULL NULL 2023-10-13T09:15:16Z INFO - Disabling error loging on lfg-site-1... 2023-10-13T09:15:16Z INFO - Disabling printing FAILED or "" (empty) STATEs... 2023-10-13T09:15:27Z INFO - Waiting for pods to restart... 2023-10-13T09:16:10Z INFO - STATE = RECONNECTSQLNODES 2023-10-13T09:16:10Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 RECONNECTSQLNODES lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:16:12Z INFO - Waiting for pods to restart... 2023-10-13T09:16:21Z INFO - Re-enabling printing FAILED or "" (empty) STATEs again if they occur 2023-10-13T09:16:21Z INFO - Re-enabling error loging... 2023-10-13T09:16:21Z INFO - STATE = BACKUPRESTORE 2023-10-13T09:16:22Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 BACKUPRESTORE lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:16:43Z INFO - STATE = RESTORED 2023-10-13T09:16:43Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 RESTORED lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:16:59Z INFO - STATE = BINLOGINITIALIZED 2023-10-13T09:17:00Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 BINLOGINITIALIZED lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:17:13Z INFO - STATE = RECONFIGURE 2023-10-13T09:17:13Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 RECONFIGURE lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:17:59Z INFO - STATE = COMPLETED 2023-10-13T09:18:00Z INFO - site_name site_id dr_state dr_backup_site_name backup_id lfg-site-1 1 COMPLETED lfg-site-2 1013230913 lfg-site-2 2 COMPLETED lfg-site-1 1013230837 2023-10-13T09:18:00Z INFO - **************************************************************************************************** 2023-10-13T09:18:00Z INFO - END PHASE 4: Start monitoring GRR 2023-10-13T09:18:00Z INFO - **************************************************************************************************** 2023-10-13T09:18:00Z INFO - **************************************************************************************************** 2023-10-13T09:18:00Z INFO - BEGIN PHASE 5: Run post-processing 2023-10-13T09:18:00Z INFO - **************************************************************************************************** 2023-10-13T09:18:00Z INFO - mate replication_svc IP: 10.121.27.176 2023-10-13T09:18:01Z INFO - lfg-site-1, 10.121.27.176:80, 22.4.2 2023-10-13T09:18:01Z INFO - GRR has automatically restarted mysqld containers (cnDBTier 22.4.2); skipping restart... 2023-10-13T09:18:01Z INFO - **************************************************************************************************** 2023-10-13T09:18:01Z INFO - END PHASE 5: Run post-processing 2023-10-13T09:18:01Z INFO - **************************************************************************************************** 2023-10-13T09:18:01Z INFO - GRR COMPLETED for lfg-site-1 2023-10-13T09:18:01Z INFO - dbtend_ts = 1697188681 2023-10-13T09:18:01Z INFO - GRR took: 00 hr. 05 min. 37 sec. 2023-10-13T09:18:01Z INFO - dbtrecover took: 00 hr. 09 min. 21 sec.
Note:
After recovering the site, create NF user accounts and grant privileges as per your requirement by following the Creating NF Users procedure.7.4.4 Verifying cnDBTier Cluster Status
This section provides the procedures to verify cnDBTier cluster status using cnDBTier or CNC Console.
7.4.4.1 Verifying Cluster Status Using cnDBTier
This section provides the steps to verify the cluster status using cnDBTier.
Note:
db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.- Run the following command to check the status of the pods in
cnDBTier
clusters:
$ kubectl -n <namespace of cnDBTier Cluster> get pods
For example:
Run the following command to check the status of cnDBTier cluster1:$ kubectl -n cluster1 get pods
Sample output:NAME READY STATUS RESTARTS AGE mysql-cluster-cluster1-cluster2-replication-svc-86797d648fr6ttm 1/1 Running 0 6m38s mysql-cluster-cluster1-cluster3-replication-svc-7fd6c75c6562gj8 1/1 Running 0 6m39s mysql-cluster-cluster1-cluster4-replication-svc-869b65666bw7q6c 1/1 Running 0 6m38s mysql-cluster-db-backup-manager-svc-64bf559895-qfkj7 1/1 Running 0 6m38s mysql-cluster-db-monitor-svc-5bdfd4fb96-jwqzn 1/1 Running 0 6m38s ndbappmysqld-0 2/2 Running 0 7m ndbappmysqld-1 2/2 Running 0 7m ndbmgmd-0 2/2 Running 0 7m ndbmgmd-1 2/2 Running 0 7m ndbmtd-0 3/3 Running 0 7m ndbmtd-1 3/3 Running 0 6m ndbmtd-2 3/3 Running 0 6m ndbmtd-3 3/3 Running 0 6m ndbmysqld-0 3/3 Running 0 6m ndbmysqld-1 3/3 Running 0 6m ndbmysqld-2 3/3 Running 0 6m ndbmysqld-3 3/3 Running 0 6m ndbmysqld-4 3/3 Running 0 6m ndbmysqld-5 3/3 Running 0 6m
Run the following command to check the status of cnDBTier cluster2:$ kubectl -n cluster2 get pods
Sample output:NAME READY STATUS RESTARTS AGE mysql-cluster-cluster2-cluster1-replication-svc-fb48bfd9-4klzx 1/1 Running 0 10m mysql-cluster-cluster2-cluster3-replication-svc-797599746dshtgr 1/1 Running 0 10m mysql-cluster-cluster2-cluster4-replication-svc-6db8d76495bkf5v 1/1 Running 0 10m mysql-cluster-db-backup-manager-svc-85d4b7f7c5-j7ptp 1/1 Running 0 16m mysql-cluster-db-monitor-svc-5f68dd795f-cv8jc 1/1 Running 0 16m ndbappmysqld-0 2/2 Running 0 16m ndbappmysqld-1 2/2 Running 0 16m ndbmgmd-0 2/2 Running 0 16m ndbmgmd-1 2/2 Running 0 16m ndbmtd-0 3/3 Running 0 16m ndbmtd-1 3/3 Running 0 15m ndbmtd-2 3/3 Running 0 14m ndbmtd-3 3/3 Running 0 14m ndbmysqld-0 3/3 Running 0 16m ndbmysqld-1 3/3 Running 0 13m ndbmysqld-2 3/3 Running 0 12m ndbmysqld-3 3/3 Running 0 12m ndbmysqld-4 3/3 Running 0 11m ndbmysqld-5 3/3 Running 0 11m
Run the following command to check the status of cnDBTier cluster3:$ kubectl -n cluster3 get pods
Sample output:NAME READY STATUS RESTARTS AGE mysql-cluster-cluster3-cluster1-replication-svc-55696b5dbbdp9ts 1/1 Running 0 15m mysql-cluster-cluster3-cluster2-replication-svc-778bd44fb7qgjt5 1/1 Running 0 15m mysql-cluster-cluster3-cluster4-replication-svc-58c66f896b27cxh 1/1 Running 0 15m mysql-cluster-db-backup-manager-svc-57f7d77d8c-pgllm 1/1 Running 0 15m mysql-cluster-db-monitor-svc-784c4c7f9d-p5zvm 1/1 Running 0 15m ndbappmysqld-0 2/2 Running 0 16m ndbappmysqld-1 2/2 Running 0 16m ndbmgmd-0 2/2 Running 0 16m ndbmgmd-1 2/2 Running 0 16m ndbmtd-0 3/3 Running 0 16m ndbmtd-1 3/3 Running 0 15m ndbmtd-2 3/3 Running 0 14m ndbmtd-3 3/3 Running 0 14m ndbmysqld-0 3/3 Running 0 18m ndbmysqld-1 3/3 Running 0 18m ndbmysqld-2 3/3 Running 0 18m ndbmysqld-3 3/3 Running 0 18m ndbmysqld-4 3/3 Running 0 18m ndbmysqld-5 3/3 Running 0 18m
Run the following command to check the status of cnDBTier cluster4:$ kubectl -n cluster4 get pods
Sample output:NAME READY STATUS RESTARTS AGE mysql-cluster-cluster4-cluster1-replication-svc-7dd6987b4fpwvb5 1/1 Running 0 24m mysql-cluster-cluster4-cluster2-replication-svc-5ddbdcdd75j5rj2 1/1 Running 0 24m mysql-cluster-cluster4-cluster3-replication-svc-64b7bbcfcblmz4j 1/1 Running 0 24m mysql-cluster-db-backup-manager-svc-fc49f8c9f-xng57 1/1 Running 0 24m mysql-cluster-db-monitor-svc-5cfd757866-s6lk6 1/1 Running 0 24m ndbappmysqld-0 2/2 Running 0 16m ndbappmysqld-1 2/2 Running 0 16m ndbmgmd-0 2/2 Running 0 12m ndbmgmd-1 2/2 Running 0 12m ndbmtd-0 3/3 Running 0 16m ndbmtd-1 3/3 Running 0 17m ndbmtd-2 3/3 Running 0 18m ndbmtd-3 3/3 Running 0 19m ndbmysqld-0 3/3 Running 0 21m ndbmysqld-1 3/3 Running 0 21m ndbmysqld-2 3/3 Running 0 22m ndbmysqld-3 3/3 Running 0 23m ndbmysqld-4 3/3 Running 0 23m ndbmysqld-5 3/3 Running 0 24m
- Run the following command to check the status of the MySQL NDB
cluster in cnDBtier
cluster:
$ kubectl -n <namespace of cnDBTier Cluster> exec -it ndbmgmd-0 -- ndb_mgm -e show
For example:
Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster1:$ kubectl -n cluster1 exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.124.92 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=2 @10.233.113.109 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.116.79 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1, *) id=4 @10.233.71.90 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.78.61 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.109.99 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 12 node(s) id=56 @10.233.120.210 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.124.93 (mysql-8.4.2 ndb-8.4.2) id=58 @10.233.109.230 (mysql-8.4.2 ndb-8.4.2) id=59 @10.233.71.17 (mysql-8.4.2 ndb-8.4.2) id=60 @10.233.114.197 (mysql-8.4.2 ndb-8.4.2) id=61 @10.233.116.251 (mysql-8.4.2 ndb-8.4.2) id=70 @10.233.71.89 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.113.110 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster2:$ kubectl -n cluster2 exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.116.82 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.120.61 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.109.100 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) id=4 @10.233.89.68 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.84.54 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.114.63 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 12 node(s) id=56 @10.233.89.246 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.116.250 (mysql-8.4.2 ndb-8.4.2) id=58 @10.233.121.201 (mysql-8.4.2 ndb-8.4.2) id=59 @10.233.78.250 (mysql-8.4.2 ndb-8.4.2) id=60 @10.233.84.213 (mysql-8.4.2 ndb-8.4.2) id=61 @10.233.124.95 (mysql-8.4.2 ndb-8.4.2) id=70 @10.233.121.56 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.84.55 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster3:$ kubectl -n cluster3 exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.108.208 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.78.249 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.39.100 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) id=4 @10.233.36.68 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.113.87 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.114.32 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 12 node(s) id=56 @10.233.71.16 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.114.196 (mysql-8.4.2 ndb-8.4.2) id=58 @10.233.84.212 (mysql-8.4.2 ndb-8.4.2) id=59 @10.233.108.210 (mysql-8.4.2 ndb-8.4.2) id=60 @10.233.121.202 (mysql-8.4.2 ndb-8.4.2) id=61 @10.233.109.231 (mysql-8.4.2 ndb-8.4.2) id=70 @10.233.121.37 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.84.38 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Run the following command to check the status of MySQL NDB Cluster in cnDBTier cluster4:$ kubectl -n cluster4 exec -it ndbmgmd-0 -- ndb_mgm -e show
Sample output:Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=1 @10.233.78.248 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0, *) id=2 @10.233.108.209 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 0) id=3 @10.233.109.43 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) id=4 @10.233.89.44 (mysql-8.4.2 ndb-8.4.2, Nodegroup: 1) [ndb_mgmd(MGM)] 2 node(s) id=49 @10.233.89.247 (mysql-8.4.2 ndb-8.4.2) id=50 @10.233.99.32 (mysql-8.4.2 ndb-8.4.2) [mysqld(API)] 12 node(s) id=56 @10.233.109.228 (mysql-8.4.2 ndb-8.4.2) id=57 @10.233.121.200 (mysql-8.4.2 ndb-8.4.2) id=58 @10.233.84.211 (mysql-8.4.2 ndb-8.4.2) id=59 @10.233.124.94 (mysql-8.4.2 ndb-8.4.2) id=60 @10.233.113.89 (mysql-8.4.2 ndb-8.4.2) id=61 @10.233.114.198 (mysql-8.4.2 ndb-8.4.2) id=70 @10.233.121.47 (mysql-8.4.2 ndb-8.4.2) id=71 @10.233.84.48 (mysql-8.4.2 ndb-8.4.2) id=222 (not connected, accepting connect from any host) id=223 (not connected, accepting connect from any host) id=224 (not connected, accepting connect from any host) id=225 (not connected, accepting connect from any host)
Note:
Node IDs 222 to 225 in the sample outputs are shown as "not connected" as these are added as empty slot IDs that are used for georeplication recovery. You can ignore these node IDs. - If all the nodes are connected to the MySQL NDB cluster and all pods of cnDBTier cluster are in the Running State, then cnDBTier cluster is considered as UP and to not have fatal errors.
7.4.4.2 Verifying cnDBTier Cluster Status Using CNC Console
This section provides the steps to verify cnDBTier cluster status using CNC console.
- Log in to CNC Console GUI of the cluster.
For example:
- If you want to verify the status of cluster1, then log in to the CNC Console GUI of cluster1.
- If you want to verify the status of cluster2, then log in to the CNC Console GUI of cluster2.
- Expand cnDBTier under the NF menu.
- Select Local Cluster Status.
The system displays the name and the status of the cluster. For example, if you are logged in to cluster1, the system displays the status of cluster1.
The following image shows a sample Local Cluster Status page displaying the status of cluster1:Figure 7-12 Verify Cluster Status
7.4.5 Monitoring the Georeplication Recovery Status
This section provides the procedures and examples to monitor georeplication recovery status using cnDBTier APIs or CNC console.
7.4.5.1 Monitoring Georeplication Recovery Status Using cnDBTier APIs
This section provides the procedure and examples to monitor georeplication recovery status using cnDBTier APIs.
Note:
- db-backup-manager-svc is designed to automatically restart in case of errors. Therefore, when the backup-manager-svc encounters a temporary error during the georeplication recovery process, it may undergo several restarts. When cnDBTier reaches a stable state, the db-backup-manager-svc pod operates normally without any further restarts.
- The georeplication recovery process transitions through various states depending on different scenarios and configurations (REINSTALLED → STARTDRRESTORE → INITIATEBACKUP → CHECKBACKUP →COPY_BACKUP→ CHECK_BACKUP_COPY→ BACKUPCOPIED → BACKUPEXTRACTED→ FAILED→ RECONNECTSQLNODES->BACKUPRESTORE→ RESTORED→ BINLOGINITIALIZED→ RECONFIGURE→ COMPLETED).
- If
dr_state
displaysGRR_FAILED
, it indicates that the georeplication recovery failed. In such a case, check the replication service logs for more details and restart the georeplication recovery. - The system raises different alerts in case of backup transfer failure such as BACKUP_TRANSFER_LOCAL_FAILED and BACKUP_TRANSFER_FAILED. For more information about backup transfer status alerts, see the "cnDBTier Alerts" section in Oracle Communications Cloud Native Core, cnDBTier User Guide.
- Run the following command to get the replication service
LoadBalancer IP of the site on which the restore is being
performed.
$ export IP=$(kubectl get svc -n <namespace> | grep repl | awk '{print $4}' | head -n 1 )
where,
<namespace>
is the namespace of the failed site. - Run the following command to get the replication service
LoadBalancer Port of the site on which restore is being
performed:
$ export PORT=$(kubectl get svc -n <namespace> | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
where
<namesapce>
is the namespace of the failed site. - Run the following command to get the georeplication restore status.
If the value of gr_state is
COMPLETED
, then database is restored and the replication channels are reestablished.$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/{sitename}/status
Note:
For more information about georeplication recover API responses, error codes, and curl commands for HTTPS enabled replication service, see Fault Recovery APIs.
Examples for monitoring georeplication recovery status
- Run the following command to get the replication service
LoadBalancer IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer IP for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to get the georeplication restore status of
cluster1. If the value of
gr_state
is COMPLETED, then database is restored and the replication channels are reestablished:$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status
Sample output if georeplication completed successfully:{"localSiteName":"cluster1","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster2","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
- Run the following command to get the replication service
LoadBalancer IP for
cluster2:
$ export IP=$(kubectl get svc -n cluster2 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer IP for
cluster2:
$ export PORT=$(kubectl get svc -n cluster2 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to get the georeplication restore status of
cluster2. If the value of
gr_state
is COMPLETED, then database is restored and the replication channels are reestablished:$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster2/status
Sample output if georeplication completed successfully:{"localSiteName":"cluster2","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
- Run the following command to get the replication service
LoadBalancer IP for
cluster3:
$ export IP=$(kubectl get svc -n cluster3 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer IP for
cluster3:
$ export PORT=$(kubectl get svc -n cluster3 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to get the georeplication restore status of
cluster3. If the value of
gr_state
is COMPLETED, then database is restored and the replication channels are reestablished:$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster3/status
Sample output if georeplication completed successfully:{"localSiteName":"cluster3","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
- Run the following command to get the replication service
LoadBalancer IP for
cluster4:
$ export IP=$(kubectl get svc -n cluster4 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following command to get the replication service
LoadBalancer IP for
cluster4:
$ export PORT=$(kubectl get svc -n cluster4 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Run the following command to get the georeplication restore status of
cluster4. If the value of
gr_state
is COMPLETED, then database is restored and the replication channels are reestablished:$ curl -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster4/status
Sample output if georeplication completed successfully:{"localSiteName":"cluster4","grstatus":"COMPLETED","gr_state":"COMPLETED","remotesitesGrInfo":[{"remoteSiteName":"cluster1","replchannel_group_id":"1","gr_state":"COMPLETED"}]}
7.4.5.2 Monitoring Georeplication Recovery Status Using CNC Console
This section provides the procedure and examples to monitor georeplication recovery status using CNC Console.
- [Optional]: Log in to the CNC Console GUI of the cluster. Skip this step if you are already logged in.
- Expand cnDBTier under the NF menu and select Georeplication Recovery.
- Click Georeplication Recovery Status.
The system displays the current status of the georeplication recovery as shown in the following image:
Figure 7-13 Georeplication Recovery Status
Note:
If georeplication recovery status remains in theFAILED
status right after theCHECK_BACKUP_COPY
status for a long time, then check the replication service logs for more information about georeplication recovery status and to check if the backup transfer failed.
7.4.5.3 Georeplication Recovery Status
Table 7-3 Georeplication Recovery Status
Georeplication Recovery Status | Description |
---|---|
ACTIVE | Indicates that the cluster is in healthy state, and the replication is UP and running in sync with its mate cluster. |
REINSTALLED | Indicates that the cluster is reinstalled to resolve fatal error. |
STARTDRRESTORE | Indicates that the georeplication recovery has started. |
VALIDATERESOURCES | Indicates that the georeplication recovery resources in the cluster (PVC size and CPU of the leader DB replication service) where GRR is initiated is validated. |
INITIATEBACKUP | Indicates that the cluster is identifying the healthy cluster to initiate backup. |
CHECKBACKUP | Indicated that the backup is initiated and the cluster is monitoring the backup initiation process. If the backup initiation fails, the cluster re-initiates the backup. |
COPY_BACKUP | Indicate that the backup initiation is complete and the system has requested for the backup transfer from the healthy cluster to the cluster to be recovered. |
CHECK_BACKUP_COPY | Indicates that the backup copy is in progress and the cluster is monitoring the backup transfer progress. If the backup transfer fails, the cluster re-initiates the backup transfer. |
BACKUPCOPIED | Indicates that the backup transfer is complete and the georeplication recovery cluster can start the backup extraction process. |
BACKUPEXTRACTED | Indicates that the backup extraction is complete at the georeplication recovery cluster and the system can start the backup restore. |
FAILED | Indicates that the cluster is in failed state and needs to be recovered. This state can also indicates that the georeplication recovery has started and the database is restored using the healthy cluster backup. |
UNKNOWN | Indicates that all the databases are removed from the georeplication recovery cluster so that the cluster can be restored using the backup from the healthy cluster. |
RECONNECTSQLNODES | Indicates the SQL nodes must be down during the backup restore such that no record gets into the binlog of the georeplication recovery cluster. |
BACKUPRESTORE | Indicates that the backup that is copied from the healthy cluster is being used to restore the georeplication recovery cluster. |
RESTORED | Indicates that the cluster is restored using the backup and the system can start reestablishing the replication channels. |
BINLOGINITIALIZED | Indicates that the binlogs are being reinitialized to start the restore of replication channels. |
RECONFIGURE | Indicates that the binlogs are reinitialized and the system is reestablishing the replication channels with respect to its mate clusters. |
7.4.6 Uninstalling cnDBTier Cluster
- Run the following command to uninstall a cnDBTier
cluster:
$ helm uninstall mysql-cluster --namespace <namespace>
where,
<namespace>
is the namespace name of the cnDBTier cluster.For example,
Run the following command to uninstall the cnDBTier cluster1:$ helm uninstall mysql-cluster --namespace cluster1
Sample output:release "mysql-cluster" uninstalled
Run the following command to uninstall the cnDBTier cluster2:$ helm uninstall mysql-cluster --namespace cluster2
Sample output:release "mysql-cluster" uninstalled
Run the following command to uninstall the cnDBTier cluster3:$ helm uninstall mysql-cluster --namespace cluster3
Sample output:release "mysql-cluster" uninstalled
Run the following command to uninstall the cnDBTier cluster4:$ helm uninstall mysql-cluster --namespace cluster4
Sample output:release "mysql-cluster" uninstalled
- Run the following command to delete PVC of cnDBTier
cluster:
$ kubectl -n <namespace of cnDBTier cluster> get pvc $ kubectl -n <namespace of cnDBTier cluster> get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> delete pvc $ kubectl -n <namespace of cnDBTier cluster> get pvc
For example,
Run the following command to delete the PVC of cnDBTier Cluster1:$ kubectl -n cluster1 get pvc $ kubectl -n cluster1 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 delete pvc $ kubectl -n cluster1 get pvc
Run the following command to delete the PVC of cnDBTier Cluster2:$ kubectl -n cluster2 get pvc $ kubectl -n cluster2 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 delete pvc $ kubectl -n cluster2 get pvc
Run the following command to delete the PVC of cnDBTier Cluster3:$ kubectl -n cluster3 get pvc $ kubectl -n cluster3 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 delete pvc $ kubectl -n cluster3 get pvc
Run the following command to delete the PVC of cnDBTier Cluster4:$ kubectl -n cluster4 get pvc $ kubectl -n cluster4 get pvc | egrep 'ndbmgmd|ndbmtd|ndbmysqld|ndbappmysqld|replication-svc' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 delete pvc $ kubectl -n cluster4 get pvc
7.4.7 Reinstalling cnDBTier Cluster
This section provides the procedure to reinstall a cnDBTier cluster.
- Follow Uninstalling cnDBTier Cluster procedure for uninstalling the cnDBTier cluster.
- Install the cnDBTier cluster by configuring the remote site IP
address of the replication service of remote cnDBTier cluster for restoring the
database. For more information on how to install the cnDBTier cluster, see Installing cnDBTier:
- Update the
remotesiteip
configuration in thecustom_values.yaml
file with the remote site IP address or FQDN of the replication services from the remote cnDBtier cluster.- If the cnDBTier cluster1 is undergoing a
georeplication recovery:
- Retrieve the replication service IP address
from cnDBTier cluster2 if the first mate site (cnDBTier
cluster2) is already configured or not marked as
FAILED:
$ kubectl -n cluster2 get svc | grep cluster2-cluster1-replication-svc | awk '{ print $4 }' 10.75.02.01
- Retrieve the replication service IP address
from cnDBTier cluster3 if the second mate site (cnDBTier
cluster3) is already configured or not marked as
FAILED:
$ kubectl -n cluster3 get svc | grep cluster3-cluster1-replication-svc | awk '{ print $4 }' 10.75.03.01
- Retrieve the replication service IP address
from cnDBTier cluster4 if the third mate site (cnDBTier
cluster4) is already configured or not marked as
FAILED:
$ kubectl -n cluster4 get svc | grep cluster4-cluster1-replication-svc | awk '{ print $4 }' 10.75.04.01
- Perform the following steps to update the
custom_values.yaml
file for cnDBTier cluster1, if any remote cnDBTier cluster is marked as FAILED or not installed then configure theremotesiteip
as empty string "":- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster2 mate site
details if the first mate site is
configured:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.01" remotesiteport: "80"
- Configure the cnDBTier cluster3 mate site
details if the second mate site is
configured:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.01" remotesiteport: "80"
- Configure the cnDBTier cluster4 mate site
details if the third mate site is
configured:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.01" remotesiteport: "80"
- Run the following command to edit the
- Retrieve the replication service IP address
from cnDBTier cluster2 if the first mate site (cnDBTier
cluster2) is already configured or not marked as
FAILED:
- If the cnDBTier cluster2 is undergoing a
georeplication recovery:
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
$ kubectl -n cluster1 get svc | grep cluster1-cluster2-replication-svc | awk '{ print $4 }' 10.75.01.02
- Retrieve the replication service IP address
from cnDBTier cluster3 if the second mate site (cnDBTier
cluster3) is already configured or not marked as
FAILED:
$ kubectl -n cluster3 get svc | grep cluster3-cluster2-replication-svc | awk '{ print $4 }' 10.75.03.02
- Retrieve the replication service IP address
from cnDBTier cluster4 if the third mate site (cnDBTier
cluster4) is already configured or not marked as
FAILED:
$ kubectl -n cluster4 get svc | grep cluster4-cluster2-replication-svc | awk '{ print $4 }' 10.75.04.02
- Perform the following steps to update the
custom_values.yaml
file for cnDBTier cluster2, if any remote cnDBTier cluster is marked as FAILED or not installed then configure theremotesiteip
as empty string "":- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 mate site
details if the first mate site is
configured:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.02" remotesiteport: "80"
- Configure the cnDBTier cluster3 mate site
details if the second mate site is
configured:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.02" remotesiteport: "80"
- Configure the cnDBTier cluster4 mate site
details if the third mate site is
configured:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.02" remotesiteport: "80"
- Run the following command to edit the
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
- If the cnDBTier cluster3 is undergoing a
georeplication recovery:
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
$ kubectl -n cluster1 get svc | grep cluster1-cluster3-replication-svc | awk '{ print $4 }' 10.75.01.03
- Retrieve the replication service IP address
from cnDBTier cluster2 if the second mate site (cnDBTier
cluster2) is already configured or not marked as
FAILED:
$ kubectl -n cluster2 get svc | grep cluster2-cluster3-replication-svc | awk '{ print $4 }' 10.75.02.03
- Retrieve the replication service IP address
from cnDBTier cluster4 if the third mate site (cnDBTier
cluster4) is already configured or not marked as
FAILED:
$ kubectl -n cluster4 get svc | grep cluster4-cluster3-replication-svc | awk '{ print $4 }' 10.75.04.03
- Perform the following steps to update the
custom_values.yaml
file for cnDBTier cluster3, if any remote cnDBTier cluster is marked as FAILED or not installed then configure theremotesiteip
as empty string "":- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 mate site
details if the first mate site is
configured:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.03" remotesiteport: "80"
- Configure the cnDBTier cluster2 mate site
details if the second mate site is
configured:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.03" remotesiteport: "80"
- Configure the cnDBTier cluster4 mate site
details if the third mate site is
configured:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.03" remotesiteport: "80"
- Run the following command to edit the
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
- If the cnDBTier cluster4 is undergoing a
georeplication recovery:
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
$ kubectl -n cluster1 get svc | grep cluster1-cluster4-replication-svc | awk '{ print $4 }' 10.75.01.04
- Retrieve the replication service IP address
from cnDBTier cluster2 if the second mate site (cnDBTier
cluster2) is already configured or not marked as
FAILED:
$ kubectl -n cluster2 get svc | grep cluster2-cluster4-replication-svc | awk '{ print $4 }' 10.75.02.04
- Retrieve the replication service IP address
from cnDBTier cluster3 if the third mate site (cnDBTier
cluster3) is already configured or not marked as
FAILED:
$ kubectl -n cluster3 get svc | grep cluster3-cluster4-replication-svc | awk '{ print $4 }' 10.75.03.04
- Perform the following steps to update the
custom_values.yaml
file for cnDBTier cluster4, if any remote cnDBTier cluster is marked as FAILED or not installed then configure theremotesiteip
as empty string "":- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 mate site
details if the first mate site is
configured:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.04" remotesiteport: "80"
- Configure the cnDBTier cluster2 mate site
details if the second mate site is
configured:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.04" remotesiteport: "80"
- Configure the cnDBTier cluster3 mate site
details if the third mate site is
configured:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.04" remotesiteport: "80"
- Run the following command to edit the
- Retrieve the replication service IP address
from cnDBTier cluster1 if the first mate site (cnDBTier
cluster1) is already configured or not marked as
FAILED:
- If the cnDBTier cluster1 is undergoing a
georeplication recovery:
- After updating the cnDBTier
custom_values.yaml
file for the cnDBTier cluster, see Installing cnDBTier for installing the failed cnDBTier cluster. - Log in to Bastion Host of cnDBTier cluster and scale down
the db replication service deployments in the reinstalled cnDBTier
cluster:
Note:
For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=0
For example:- Run the following command to scale down the db replication
service deployments if the reinstalled cnDBTier cluster is
cnDBTier
cluster1:
$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=0
- Run the following command to scale down the db replication
service deployments if the reinstalled cnDBTier cluster is
cnDBTier
cluster2:
$ kubectl -n cluster2 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 scale deployment --replicas=0
- Run the following command to scale down the db replication
service deployments if the reinstalled cnDBTier cluster is
cnDBTier
cluster3:
$ kubectl -n cluster3 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 scale deployment --replicas=0
- Run the following command to scale down the db replication
service deployments if the reinstalled cnDBTier cluster is
cnDBTier
cluster4:
$ kubectl -n cluster4 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 scale deployment --replicas=0
- Run the following command to scale down the db replication
service deployments if the reinstalled cnDBTier cluster is
cnDBTier
cluster1:
- Update the
- Update the replication service IP address of the cnDBTier cluster
that is reinstalled in the remote cnDBTier clusters if FQDN is not used for
"
remotesiteip
" config in thecustom_values.yaml
file. Upgrade the DB replication service deployments in the remote cnDBTier clusters.- If the cnDBTier cluster1 is reinstalled, then update the
custom_values.yaml
file in the remote cnDBTier clusters and upgrade DB replication service deployments:- Run the following commands to update the IP address
of the replication services in the remote cnDBTier
clusters:
$ kubectl -n cluster1 get svc | grep cluster1-cluster2-replication-svc | awk '{ print $4 }' 10.75.01.02 $ kubectl -n cluster1 get svc | grep cluster1-cluster3-replication-svc | awk '{ print $4 }' 10.75.01.03 $ kubectl -n cluster1 get svc | grep cluster1-cluster4-replication-svc | awk '{ print $4 }' 10.75.01.04
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster2 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 site IP address in
cnDBTier
cluster2:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.02" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster2:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster3 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 site IP details in
cnDBTier
cluster3:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.03" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster3:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster4 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster1 site IP details in
cnDBTier
cluster4:
replication: matesitename: "cluster1" remotesiteip: "10.75.01.04" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster4:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Run the following commands to update the IP address
of the replication services in the remote cnDBTier
clusters:
- If the cnDBTier cluster2 is reinstalled, then update the
values.yaml
file in the remote cnDBTier clusters and upgrade DB replication service deployments:- Update the IP address of the replication services
in the remote cnDBTier
clusters:
$ kubectl -n cluster2 get svc | grep cluster2-cluster1-replication-svc | awk '{ print $4 }' 10.75.02.01 $ kubectl -n cluster2 get svc | grep cluster2-cluster3-replication-svc | awk '{ print $4 }' 10.75.02.03 $ kubectl -n cluster2 get svc | grep cluster2-cluster4-replication-svc | awk '{ print $4 }' 10.75.02.04
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster1 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster2 site IP address in
cnDBTier
cluster1:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.01" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster1:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster3 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster2 site IP address in
cnDBTier
cluster3:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.03" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster3:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster3 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster2 site IP address in
cnDBTier
cluster3:
replication: matesitename: "cluster2" remotesiteip: "10.75.02.03" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster4:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Update the IP address of the replication services
in the remote cnDBTier
clusters:
- If cnDBTier cluster3 is reinstalled, then update the
custom_values.yaml
file in the remote cnDBTier clusters and upgrade DB replication service deployments:- Update the IP address of the replication services
in the remote cnDBTier
clusters:
$ kubectl -n cluster3 get svc | grep cluster3-cluster1-replication-svc | awk '{ print $4 }' 10.75.03.01 $ kubectl -n cluster3 get svc | grep cluster3-cluster2-replication-svc | awk '{ print $4 }' 10.75.03.02 $ kubectl -n cluster3 get svc | grep cluster3-cluster4-replication-svc | awk '{ print $4 }' 10.75.03.04
- Perform the following steps to update the
custom_values.yaml
file for remote cnDBTier cluster1 and upgarde:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster3 site IP address in
cnDBTier
cluster1:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.01" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster1:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file for remote cnDBTier cluster2 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster3 site IP details in
cnDBTier
cluster2:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.02" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster2:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file for remote cnDBTier cluster4 upgarde:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster3 site IP details in
cnDBTier
cluster4:
replication: matesitename: "cluster3" remotesiteip: "10.75.03.04" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster4:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster4 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Update the IP address of the replication services
in the remote cnDBTier
clusters:
- If the cnDBTier cluster4 is reinstalled, then update the
custom_values.yaml
file in remote cnDBTier clusters and upgrade DB replication service deployments:- Update the IP address of the replication services
in the remote cnDBTier
clusters:
$ kubectl -n cluster4 get svc | grep cluster4-cluster1-replication-svc | awk '{ print $4 }' 10.75.04.01 $ kubectl -n cluster4 get svc | grep cluster4-cluster2-replication-svc | awk '{ print $4 }' 10.75.04.02 $ kubectl -n cluster4 get svc | grep cluster4-cluster3-replication-svc | awk '{ print $4 }' 10.75.04.03
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster1 and upgarde:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster4 site IP address in
cnDBTier
cluster1:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.01" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster1:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster1 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster2 and upgrade:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster4 site IP address in
cnDBTier
cluster2:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.02" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster2:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster2 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Perform the following steps to update the
custom_values.yaml
file of cnDBTier cluster3 and upgarde:- Run the following command to edit the
custom_values.yaml
file:$ vi occndbtier/custom_values.yaml
- Configure the cnDBTier cluster4 site IP address in
cnDBTier
cluster3:
replication: matesitename: "cluster4" remotesiteip: "10.75.04.03" remotesiteport: "80"
- Upgrade DB replication service deployments in cnDBTier
cluster3:
$ helm upgrade --no-hooks mysql-cluster --namespace cluster3 occndbtier -f occndbtier/custom_values.yaml
- Run the following command to edit the
- Update the IP address of the replication services
in the remote cnDBTier
clusters:
- If the cnDBTier cluster1 is reinstalled, then update the
- Create the required NF-specific user accounts and grants to match NF users and
grants of the good site in the reinstalled cnDBTier if user accounts does not
exist. For sample procedure, see Creating NF Users.
Note:
For more details about creating NF-specific user account and grants, refer to the NF-specific fault recovery guide. - If cnDBTier Cluster is reinstalled and fault recovery is performed
in a four site setup, then update the
REPLICATION_WAITSECS_AFTER_NDBRESTORE
environment variable in the DB replication service deployments on the reinstalled cnDBTier clusters.Note:
For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.$ for replsvcdeploy in $(kubectl -n <namespace of cnDBTier cluster> get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n <namespace of cnDBTier cluster> set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
For example:- If cnDBTier cluster1 is reinstalled and restored, then
update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment
variable in the DB replication service deployments of cnDBTier
cluster1:
$ for replsvcdeploy in $(kubectl -n cluster1 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster1 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
- If cnDBTier cluster2 is reinstalled and restored, then
update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment
variable in the DB replication service deployments of cnDBTier
cluster2:
$ for replsvcdeploy in $(kubectl -n cluster2 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster2 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
- If cnDBTier cluster3 is reinstalled and restored, then
update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment
variable in the DB replication service deployments of cnDBTier
cluster3:
$ for replsvcdeploy in $(kubectl -n cluster3 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster3 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
- If cnDBTier cluster4 is reinstalled and restored, then
update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment
variable in the DB replication service deployments of cnDBTier
cluster4:
$ for replsvcdeploy in $(kubectl -n cluster4 get deploy | grep -i replication | awk '{print $1}'); do echo "updating replsvcdeploy: ${replsvcdeploy}"; kubectl -n cluster4 set env deployment.apps/${replsvcdeploy} REPLICATION_WAITSECS_AFTER_NDBRESTORE='180000'; done
- If cnDBTier cluster1 is reinstalled and restored, then
update the REPLICATION_WAITSECS_AFTER_NDBRESTORE environment
variable in the DB replication service deployments of cnDBTier
cluster1:
- Scale up DB replication service deployments in the reinstalled
cnDBTier cluster:
Note:
For cnDBTier with multichannel replication group, the replication service deployment name includes "repl" instead of "replication". Check the deployment name and perform "egrep" on "repl" or the name that is configured.- Log in to Bastion Host of cnDBTier Cluster and scale up the DB
replication service
deployments:
$ kubectl -n <namespace of cnDBTier cluster> get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n <namespace of cnDBTier cluster> scale deployment --replicas=1
For example:- Run the following command if the reinstalled cnDBTier
cluster is cnDBTier
cluster1:
$ kubectl -n cluster1 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster1 scale deployment --replicas=1
- Run the following command if the reinstalled cnDBTier
cluster is cnDBTier
cluster2:
$ kubectl -n cluster2 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster2 scale deployment --replicas=1
- Run the following command if the reinstalled cnDBTier
cluster is cnDBTier
cluster3:
$ kubectl -n cluster3 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster3 scale deployment --replicas=1
- Run the following command if the reinstalled cnDBTier
cluster is cnDBTier
cluster4:
$ kubectl -n cluster4 get deployments | egrep 'replication' | awk '{print $1}' | xargs -L1 -r kubectl -n cluster4 scale deployment --replicas=1
- Run the following command if the reinstalled cnDBTier
cluster is cnDBTier
cluster1:
- Log in to Bastion Host of cnDBTier Cluster and scale up the DB
replication service
deployments:
7.4.8 Fault Recovery APIs
This section provides information about the cnDBTier fault recovery APIs that are used in the various stages of fault recovery procedures.
Table 7-4 Fault Recovery APIs
Fault Recovery API | Rest URL | HTTP Method | Request Payload | Response Code | Response Payload |
---|---|---|---|---|---|
Mark cluster as failed in unhealthy cluster | http://base-uri /db-tier/gr-recovery/site/{siteName}/failed
|
POST | NA |
|
|
Mark cluster as failed in healthy cluster. | http://base-uri /db-tier/gr-recovery/remotesite/{siteName}/failed
|
POST | NA |
|
|
Start recovering database from healthy cluster. | http://base-uri /db-tier/gr-recovery/site/{siteName}/start
|
POST | NA |
|
|
Start recovering database by selecting healthy cluster | http://base-uri /db-tier/gr-recovery/site/{siteName}/grbackupsite/{backupSiteName}/start
|
POST | NA |
|
|
Monitor the Fault Recovery State | http://base-uri /db-tier/gr-recovery/site/{siteName}/status
|
GET | NA |
|
|
Note:
The value of<base-uri>
in the REST URL is <db-replication-svc
Loadbalancer IP>: <db-replication-svc Loadbalancer PORT>
.
- When HTTP is enabled in a cnDBTier cluster:
- Run the following command to get the replication service
LoadBalancer IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following commands to get the replication
service LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Use the LoadBalancer IP and LoadBalancer Port obtained in the
previous steps to call the API services.
For example, you can invoke the fault recovery status API by replacing the
$IP
and$PORT
with the LoadBalancer IP and LoadBalancer Port obtained in the previous steps:$ curl -i -X GET http://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status
- Run the following command to get the replication service
LoadBalancer IP for
cluster1:
- When HTTPS is enabled in a cnDBTier cluster:
- Create the key PEM file (file.key.pem) and cert PEM file (file.crt.pem) using the p12 certificate (replicationcertificate.p12). Use the same p12 certificate that was used to enable the HTTPS while installing cnDBTier.
- Run the following command to get the replication service
LoadBalancer IP for
cluster1:
$ export IP=$(kubectl get svc -n cluster1 | grep repl | awk '{print $4}' | head -n 1 )
- Run the following commands to get the replication
service LoadBalancer Port for
cluster1:
$ export PORT=$(kubectl get svc -n cluster1 | grep repl | awk '{print $5}' | cut -d '/' -f 1 | cut -d ':' -f 1 | head -n 1)
- Use the LoadBalancer IP and LoadBalancer Port obtained in the
previous steps to call the API services.
For example, you can invoke the fault recovery status API by replacing the
$IP
and$PORT
with the LoadBalancer IP and LoadBalancer Port obtained in the previous steps:$ curl -k --cert file.crt.pem --cert-type PEM --key file.key.pem --key-type PEM --pass password https://$IP:$PORT/db-tier/gr-recovery/site/cluster1/status