Performing a Failover
If quorum is maintained, you do not need to do anything because the store is still performing normally.
In situations where a zone fails but quorum is lost, your only option is to perform a failover.
For example, suppose a store consists of two zones, "Manhattan" and "JerseyCity", each deployed in its own physical data center.
Note:
This example uses a store with a replication factor of three. In this case, each zone is also configured with three admins.
Additionally, suppose that the "Manhattan" zone fails, resulting in the failure of all of the associated Storage Nodes and a loss of quorum. In this case, if the host hardware of "Manhattan" was irreparably damaged or the problem will take too long to repair you may choose to initiate a failover.
The following steps walk you through the process of verifying failures, isolating Storage Nodes, and reducing admin quorum to perform a failover operation. This process allows service to be continued in the event of a zone failure.
-
Connect to the store. To do this, connect to an admin running in the JerseyCity zone:
java -Xmx64m -Xms64m -jar KVHOME/lib/kvstore.jar \ runadmin -host jersey1 -port 6000 \ -security USER/security/admin.security
Note:
This assumes that you must have followed the steps as mentioned in Create users and configure security with remote access.
-
Use the
verify configuration
command to confirm the failures. The output confirms the Storage Node Agents in the Manhattan zone are unavailable.kv-> verify configuration Connected to Admin in read-only mode Verify: starting verification of store mystore based upon topology sequence #115 100 partitions and 6 storage nodes Time: 2022-06-09 07:15:23 UTC Version: 21.3.10 See jersey1:/kvroot/mystore/log/mystore_{0..N}.log for progress messages Verify: Shard Status: healthy: 0 writable-degraded: 0 read-only: 1 offline: 0 total: 1 Verify: Admin Status: read-only Verify: Zone [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] RN Status: online: 0 read-only: 0 offline: 3 Verify: Zone [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false] RN Status: online: 0 read-only: 3 offline: 0 Verify: == checking storage node sn1 == Verify: sn1: ping() failed for sn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Storage Node [sn1] on nyc1:5000 Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] UNREACHABLE Verify: admin1: ping() failed for admin1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Admin [admin1] Status: UNREACHABLE Verify: rg1-rn1: ping() failed for rg1-rn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Rep Node [rg1-rn1] Status: UNREACHABLE Verify: == checking storage node sn2 == Verify: sn2: ping() failed for sn2: Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Storage Node [sn2] on nyc1:5100 Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] UNREACHABLE Verify: admin2: ping() failed for admin2: Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Admin [admin2] Status: UNREACHABLE Verify: rg1-rn2: ping() failed for rg1-rn2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Rep Node [rg1-rn2] Status: UNREACHABLE Verify: == checking storage node sn3 == Verify: sn3: ping() failed for sn3: Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Storage Node [sn3] on nyc1:5200 Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] UNREACHABLE Verify: admin3: ping() failed for admin3: Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Admin [admin3] Status: UNREACHABLE Verify: rg1-rn3: ping() failed for rg1-rn3 : Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused Verify: Rep Node [rg1-rn3] Status: UNREACHABLE Verify: == checking storage node sn4 == Verify: Storage Node [sn4] on jersey1:6000 Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false] Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true serviceStartTime: 2022-06-09 07:05:44 UTC Verify: Admin [admin4] Status: RUNNING,MASTER (non-authoritative) Verify: Rep Node [rg1-rn4] Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB Verify: == checking storage node sn5 == Verify: Storage Node [sn5] on jersey1:6100 Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false] Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true serviceStartTime: 2022-06-09 07:05:44 UTC Verify: Admin [admin5] Status: RUNNING,MASTER (non-authoritative) Verify: Rep Node [rg1-rn5] Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB Verify: == checking storage node sn6 == Verify: Storage Node [sn6] on jersey1:6200 Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false] Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true serviceStartTime: 2022-06-09 07:05:44 UTC Verify: Admin [admin6] Status: RUNNING,MASTER (non-authoritative) Verify: Rep Node [rg1-rn6] Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB Verification complete, 9 violations, 0 notes found. Verification violation: [admin1] ping() failed for admin1 : Unable to connect to the storage node agent at host nyc1, port 5000 , which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [admin2] ping() failed for admin2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [admin3] ping() failed for admin3 : Unable to connect to the storage node at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [rg1-rn1] ping() failed for rg1-rn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [rg1-rn2] ping() failed for rg1-rn2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [rg1-rn3] ping() failed for rg1-rn3 : Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [sn1] ping() failed for sn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host:nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [sn2] ping() failed for sn2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Verification violation: [sn3] ping() failed for sn3 : Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is: java.rmi.ConnectException: Connection refused to host:nyc1; nested exception is: java.net.ConnectException: Connection refused (Connection refused)
In this case, the Storage Node Agent at host nyc1 is confirmed unavailable.
-
To prevent a hard rollback and data loss, isolate failed nodes (Manhattan) from the rest of the system. Make sure all failed nodes are prevented from rejoining the store until their configurations have been updated.
To do this, you can:
-
Disconnect the network physically or use a firewall.
-
Modify the start-up sequence on failed nodes to prevent SNAs from starting.
-
-
To make changes to the store, you first need to reduce admin quorum. To do this, use the
repair-admin-quorum
command, specifying the available primary zone:kv-> repair-admin-quorum -znname JerseyCity Connected to admin in read-only mode Repaired admin quorum using admins: [admin4, admin5, admin6]
Now you can perform administrative procedures using the remaining admin service with the temporarily reduced quorum.
-
Use the
plan failover
command to update the configuration of the store with the available zones.kv-> plan failover -znname \ JerseyCity -type primary \ -znname Manhattan -type offline-secondary -wait Executing plan 8, waiting for completion... Plan 8 ended successfully
The
plan failover
command fails if it is executed while other plans are still running. You should cancel or interrupt the plans, before executing this plan.For example, suppose the
topology redistribute
is in progress. If you run theplan failover
command, it will fail. For it to succeed, you need to first cancel or interrupt thetopology redistribute
command.To do this, first use the
show plans
command to learn the plan ID of the topology redistribute command. In this case, 9. Then, cancel thetopology redistribute
command using theplan cancel
command:kv-> plan cancel -id 9
After performing the failover, confirm that the zone type of Manhattan has been changed to secondary using the
ping
command.kv-> ping Pinging components of store mystore based upon topology sequence #208 100 partitions and 6 storage nodes Time: 2022-06-09 07:33:51 UTC Version: 21.3.10 Shard Status: healthy:0 writable-degraded:1 read-only:0 offline:0 Admin Status: writable-degraded Zone [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] RN Status: online:0 offline:3 Zone [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false] RN Status: online:3 offline:0 Storage Node [sn1] on nyc1:5000 Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] UNREACHABLE Admin [admin1] Status: UNREACHABLE Rep Node [rg1-rn1] Status: UNREACHABLE Storage Node [sn2] on nyc1:5100 Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] UNREACHABLE Admin [admin2] Status: UNREACHABLE Rep Node [rg1-rn2] Status: UNREACHABLE Storage Node [sn3] on nyc1:5200 Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] UNREACHABLE Admin [admin3] Status: UNREACHABLE Rep Node [rg1-rn3] Status: UNREACHABLE Storage Node [sn4] on jersey1:6000 Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Admin [admin4] Status: RUNNING,REPLICA Rep Node [rg1-rn4] Status: RUNNING,REPLICA sequenceNumber:427 haPort:6011 available storage Storage Node [sn5] on jersey1:6100 Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Admin [admin5] Status: RUNNING,REPLICA Rep Node [rg1-rn5] Status: RUNNING,REPLICA sequenceNumber:427 haPort:6011 available storage Storage Node [sn6] on jersey1:6200 Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false] Status: RUNNING Ver: 21.3.10 2021-12-21 21:24:59 UTC Build id: 78bbc4cb976b Admin [admin6] Status: RUNNING,MASTER Rep Node [rg1-rn6] Status: RUNNING,MASTER sequenceNumber:427 haPort:6011 available storage
The failover operation is now complete. Write availability in the store is reestablished using zone 2 as the only available primary zone. Zone 1 is offline. Any data that was not propagated from zone 1 prior to the failure will be lost.
Note:
In this case, the store has only a single working copy of its data, so single node failures in the surviving zone will prevent read and write access, and, if the failure is a permanent one, may produce permanent data loss.
If the problems that led to the failover have been corrected and the original data from the previously failed nodes (Manhattan) is still available, you can return the old nodes to service by performing a switchover. To do this, see the next section.