Performing a Failover

If quorum is maintained, you do not need to do anything because the store is still performing normally.

In situations where a zone fails but quorum is lost, your only option is to perform a failover.

For example, suppose a store consists of two zones, "Manhattan" and "JerseyCity", each deployed in its own physical data center.

Note:

This example uses a store with a replication factor of three. In this case, each zone is also configured with three admins.

Additionally, suppose that the "Manhattan" zone fails, resulting in the failure of all of the associated Storage Nodes and a loss of quorum. In this case, if the host hardware of "Manhattan" was irreparably damaged or the problem will take too long to repair you may choose to initiate a failover.

The following steps walk you through the process of verifying failures, isolating Storage Nodes, and reducing admin quorum to perform a failover operation. This process allows service to be continued in the event of a zone failure.

  1. Connect to the store. To do this, connect to an admin running in the JerseyCity zone:

    java -Xmx64m -Xms64m -jar KVHOME/lib/kvstore.jar \
    runadmin -host jersey1 -port 6000 \
    -security USER/security/admin.security

    Note:

    This assumes that you must have followed the steps as mentioned in Create users and configure security with remote access.

  2. Use the verify configuration command to confirm the failures. The output confirms the Storage Node Agents in the Manhattan zone are unavailable.

    kv-> verify configuration
    Connected to Admin in read-only mode
    Verify: starting verification of store mystore based upon topology sequence #115
    100 partitions and 6 storage nodes
    Time: 2022-06-09 07:15:23 UTC   Version: 21.3.10
    See jersey1:/kvroot/mystore/log/mystore_{0..N}.log for progress messages
    Verify: Shard Status: healthy: 0 writable-degraded: 0 read-only: 1 offline: 0 total: 1
    Verify: Admin Status: read-only
    Verify: Zone [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] RN Status: online: 0 read-only: 0 offline: 3
    Verify: Zone [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false]   RN Status: online: 0 read-only: 3 offline: 0
    Verify: == checking storage node sn1 ==
    Verify:  sn1: ping() failed for sn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Storage Node [sn1] on nyc1:5000
    Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
      UNREACHABLE
    Verify:  admin1: ping() failed for admin1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:  Admin [admin1]      Status: UNREACHABLE
    Verify:  rg1-rn1: ping() failed for rg1-rn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:  Rep Node [rg1-rn1]  Status: UNREACHABLE
    Verify: == checking storage node sn2 ==
    Verify:  sn2: ping() failed for sn2:
    Unable to connect to the storage node agent at host nyc1,
    port 5100, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Storage Node [sn2] on nyc1:5100
    Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
      UNREACHABLE
    Verify:  admin2: ping() failed for admin2:
    Unable to connect to the storage node agent at host nyc1,
    port 5100, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:   Admin [admin2]      Status: UNREACHABLE
    Verify:   rg1-rn2: ping() failed for rg1-rn2 :
    Unable to connect to the storage node agent at host nyc1,
    port 5100, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Rep Node [rg1-rn2]  Status: UNREACHABLE
    Verify: == checking storage node sn3 ==
    Verify:  sn3: ping() failed for sn3:
    Unable to connect to the storage node agent at host nyc1,
    port 5200, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Storage Node [sn3] on nyc1:5200
    Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
      UNREACHABLE
    Verify:  admin3: ping() failed for admin3:
    Unable to connect to the storage node agent at host nyc1,
    port 5200, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:  Admin [admin3]      Status: UNREACHABLE
    Verify: rg1-rn3: ping() failed for rg1-rn3 :
    Unable to connect to the storage node agent at host nyc1,
    port 5200, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Rep Node [rg1-rn3]  Status: UNREACHABLE
    Verify: == checking storage node sn4 ==
    Verify: Storage Node [sn4] on jersey1:6000
    Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false]
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   serviceStartTime: 2022-06-09 07:05:44 UTC 
    Verify:  Admin [admin4]
    Status: RUNNING,MASTER (non-authoritative)
    Verify:  Rep Node [rg1-rn4]
    Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB
    Verify: == checking storage node sn5 ==
    Verify: Storage Node [sn5] on jersey1:6100
    Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false]
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   serviceStartTime: 2022-06-09 07:05:44 UTC 
    Verify:  Admin [admin5]
    Status: RUNNING,MASTER (non-authoritative)
    Verify:  Rep Node [rg1-rn5]
    Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB
    Verify: == checking storage node sn6 ==
    Verify: Storage Node [sn6] on jersey1:6200
    Zone: [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false]
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   serviceStartTime: 2022-06-09 07:05:44 UTC 
    Verify:  Admin [admin6]
    Status: RUNNING,MASTER (non-authoritative)
    Verify:  Rep Node [rg1-rn6]
    Status: RUNNING,MASTER (non-authoritative) sequenceNumber:217 haPort:6003 available storage size:12 GB
    Verification complete, 9 violations, 0 notes found.
    Verification violation: [admin1]        ping() failed for admin1 : Unable to connect to the storage node agent at host nyc1, port 5000 , which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [admin2]        ping() failed for admin2 : Unable to connect to the storage node agent  at host nyc1, port 5100, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [admin3]        ping() failed for admin3 : Unable to connect to the storage node  at host nyc1, port 5200, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [rg1-rn1]       ping() failed for rg1-rn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [rg1-rn2]       ping() failed for rg1-rn2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [rg1-rn3]       ping() failed for rg1-rn3 : Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [sn1]   ping() failed for sn1 : Unable to connect to the storage node agent at host nyc1, port 5000, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host:nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [sn2]   ping() failed for sn2 : Unable to connect to the storage node agent at host nyc1, port 5100, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host: nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)
    Verification violation: [sn3]   ping() failed for sn3 : Unable to connect to the storage node agent at host nyc1, port 5200, which may not be running; nested exception is:
            java.rmi.ConnectException: Connection refused to host:nyc1; nested exception is:
            java.net.ConnectException: Connection refused (Connection refused)

    In this case, the Storage Node Agent at host nyc1 is confirmed unavailable.

  3. To prevent a hard rollback and data loss, isolate failed nodes (Manhattan) from the rest of the system. Make sure all failed nodes are prevented from rejoining the store until their configurations have been updated.

    To do this, you can:

    • Disconnect the network physically or use a firewall.

    • Modify the start-up sequence on failed nodes to prevent SNAs from starting.

  4. To make changes to the store, you first need to reduce admin quorum. To do this, use the repair-admin-quorum command, specifying the available primary zone:

    kv-> repair-admin-quorum -znname JerseyCity
    Connected to admin in read-only mode
    Repaired admin quorum using admins: [admin4, admin5, admin6] 

    Now you can perform administrative procedures using the remaining admin service with the temporarily reduced quorum.

  5. Use the plan failover command to update the configuration of the store with the available zones.

    kv-> plan failover -znname \
    JerseyCity -type primary \
    -znname Manhattan -type offline-secondary -wait
    Executing plan 8, waiting for completion...
    Plan 8 ended successfully

    The plan failover command fails if it is executed while other plans are still running. You should cancel or interrupt the plans, before executing this plan.

    For example, suppose the topology redistribute is in progress. If you run the plan failover command, it will fail. For it to succeed, you need to first cancel or interrupt the topology redistribute command.

    To do this, first use the show plans command to learn the plan ID of the topology redistribute command. In this case, 9. Then, cancel the topology redistribute command using the plan cancel command:

    kv-> plan cancel -id 9 

    After performing the failover, confirm that the zone type of Manhattan has been changed to secondary using the ping command.

    kv-> ping
    Pinging components of store mystore based upon topology sequence #208
    100 partitions and 6 storage nodes
    Time: 2022-06-09 07:33:51 UTC   Version: 21.3.10
    Shard Status: healthy:0 writable-degraded:1 read-only:0 offline:0
    Admin Status: writable-degraded
    Zone [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false]
    RN Status: online:0 offline:3
    Zone [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:3 offline:0
    Storage Node [sn1] on nyc1:5000
    Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] 
    UNREACHABLE
          Admin [admin1]          Status: UNREACHABLE
          Rep Node [rg1-rn1]      Status: UNREACHABLE
    Storage Node [sn2] on nyc1:5100
    Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] 
    UNREACHABLE
          Admin [admin2]          Status: UNREACHABLE
          Rep Node [rg1-rn2]      Status: UNREACHABLE
    Storage Node [sn3] on nyc1:5200
    Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] 
    UNREACHABLE
          Admin [admin3]          Status: UNREACHABLE
          Rep Node [rg1-rn3]      Status: UNREACHABLE
    Storage Node [sn4] on jersey1:6000
    Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b
          Admin [admin4]          Status: RUNNING,REPLICA
          Rep Node [rg1-rn4]
    Status: RUNNING,REPLICA sequenceNumber:427 haPort:6011 available storage 
    Storage Node [sn5] on jersey1:6100
    Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b
          Admin [admin5]          Status: RUNNING,REPLICA
          Rep Node [rg1-rn5]
    Status: RUNNING,REPLICA sequenceNumber:427 haPort:6011 available storage 
    Storage Node [sn6] on jersey1:6200
    Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING
    Ver: 21.3.10 2021-12-21 21:24:59 UTC  Build id: 78bbc4cb976b
         Admin [admin6]          Status: RUNNING,MASTER
         Rep Node [rg1-rn6]
    Status: RUNNING,MASTER sequenceNumber:427 haPort:6011 available storage 

    The failover operation is now complete. Write availability in the store is reestablished using zone 2 as the only available primary zone. Zone 1 is offline. Any data that was not propagated from zone 1 prior to the failure will be lost.

    Note:

    In this case, the store has only a single working copy of its data, so single node failures in the surviving zone will prevent read and write access, and, if the failure is a permanent one, may produce permanent data loss.

If the problems that led to the failover have been corrected and the original data from the previously failed nodes (Manhattan) is still available, you can return the old nodes to service by performing a switchover. To do this, see the next section.