Performing a Failover

If quorum is maintained, you do not need to do anything because the store is still performing normally.

In situations where a zone fails but quorum is lost, your only option is to perform a failover.

For example, suppose a store consists of two primary zones, "Manhattan" and "JerseyCity", each deployed in its own physical data center.

Note:

For simplicity, this example uses a store with a replication factor of two. In general, a Primary Replication Factor of 3 is adequate for most applications and is a good starting point, because 3 replicas allow write availability if a single primary zone fails.

Additionally, suppose that the "Manhattan" zone fails, resulting in the failure of all of the associated Storage Nodes and a loss of quorum. In this case, if the host hardware of "Manhattan" was irreparably damaged or the problem will take too long to repair you may choose to initiate a failover.

The following steps walk you through the process of verifying failures, isolating Storage Nodes, and reducing admin quorum to perform a failover operation. This process allows service to be continued in the event of a zone failure.

  1. Connect to the store. To do this, connect to an admin running in the JerseyCity zone:

    java -Xmx64m -Xms64m -jar KVHOME/lib/kvstore.jar \
    runadmin -host jersey1 -port 6000 \
    -security USER/security/admin.security

    Note:

    This assumes that you must have followed the steps as mentioned in Configuring Security with Remote Access.

  2. Use the verify configuration command to confirm the failures:

    kv-> verify configuration
    Connected to Admin in read-only mode
    Verify: starting verification of store mystore based upon
    topology sequence #207
    200 partitions and 2 storage nodes.
    Time: 2018-09-28 06:57:10 UTC   Version: 18.3.2
    See jersey1:/kvroot/mystore/log/mystore_{0..N}.log
                                                   for progress messages
    Verify: Shard Status: healthy:0 writable-degraded:0
                                                    read-only:1 offline:0
    Verify: Admin Status: read-only
    Verify: Zone [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:0 offline:1
    Verify: Zone [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:1 offline:0
    Verify: == checking storage node sn1 ==
    Verify:         sn1: ping() failed for sn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify: Storage Node [sn1] on nyc1:5000
    Zone: [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false] 
      UNREACHABLE
    Verify:         admin1: ping() failed for admin1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:     Admin [admin1]      Status: UNREACHABLE
    Verify:     rg1-rn1: ping() failed for rg1-rn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verify:     Rep Node [rg1-rn1]  Status: UNREACHABLE
    Verify: == checking storage node sn2 ==
    Verify: Storage Node [sn2] on jersey1:6000
    Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING
    Ver: 18.3.2 2018-09-17 09:33:45 UTC  Build id: a72484b8b33c
    Verify:     Admin [admin2]
    Status: RUNNING,MASTER (non-authoritative)
    Verify:     Rep Node [rg1-rn2]
    Status: RUNNING,MASTER (non-authoritative)
                                       sequenceNumber:217 haPort:6003 available storage size:12 GB
    Verification complete, 4 violations, 0 notes found.
    Verification violation: [admin1]        ping() failed for admin1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000,which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verification violation: [rg1-rn1]       ping() failed for rg1-rn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused
    Verification violation: [sn1]   ping() failed for sn1 :
    Unable to connect to the storage node agent at host nyc1,
    port 5000, which may not be running; nested exception is:
        java.rmi.ConnectException: Connection refused to host:
        nyc1; nested exception is:
        java.net.ConnectException: Connection refused

    In this case, the Storage Node Agent at host nyc1 is confirmed unavailable.

  3. To prevent a hard rollback and data loss, isolate failed nodes (Manhattan) from the rest of the system. Make sure all failed nodes are prevented from rejoining the store until their configurations have been updated.

    To do this, you can:

    • Disconnect the network physically or use a firewall.

    • Modify the start-up sequence on failed nodes to prevent SNAs from starting.

  4. To make changes to the store, you first need to reduce admin quorum. To do this, use the repair-admin-quorum command, specifying the available primary zone:

    kv-> repair-admin-quorum -znname JerseyCity
    Connected to admin in read-only mode
    Repaired admin quorum using admins: [admin2] 

    Now you can perform administrative procedures using the remaining admin service with the temporarily reduced quorum.

  5. Use the plan failover command to update the configuration of the store with the available zones.

    kv-> plan failover -znname \
    JerseyCity -type primary \
    -znname Manhattan -type offline-secondary -wait
    Executing plan 8, waiting for completion...
    Plan 8 ended successfully

    The plan failover command fails if it is executed while other plans are still running. You should cancel or interrupt the plans, before executing this plan.

    For example, suppose the topology redistribute is in progress. If you run the plan failover command, it will fail. For it to succeed, you need to first cancel or interrupt the topology redistribute command.

    To do this, first use the show plans command to learn the plan ID of the topology redistribute command. In this case, 9. Then, cancel the topology redistribute command using the plan cancel command:

    kv-> plan cancel -id 9 

    After performing the failover, confirm that the zone type of Manhattan has been changed to secondary using the ping command.

    kv-> ping
    Pinging components of store mystore based upon topology sequence #208
    200 partitions and 2 storage nodes
    Time: 2018-10-18 07:39:03 UTC   Version: 18.3.2
    Shard Status: healthy:0 writable-degraded:1 read-only:0 offline:0
    Admin Status: writable-degraded
    Zone [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false]
    RN Status: online:0 offline:1
    Zone [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:1 offline:0
    Storage Node [sn1] on nyc1:5000
    Zone: [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false] 
      UNREACHABLE
            Admin [admin1]          Status: UNREACHABLE
            Rep Node [rg1-rn1]      Status: UNREACHABLE
    Storage Node [sn2] on jersey1:6000
    Zone: [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]
      Status: RUNNING
    Ver: 18.3.2 2018-09-17 09:33:45 UTC  Build id: a72484b8b33c
            Admin [admin2]          Status: RUNNING,MASTER
            Rep Node [rg1-rn2]
            Status: RUNNING,MASTER sequenceNumber:427 haPort:6011 available storage size:12 GB
  6. The failover operation is now complete. Write availability in the store is reestablished using zone 2 as the only available primary zone. Zone 1 is offline. Any data that was not propagated from zone 1 prior to the failure will be lost.

    Note:

    In this case, the store has only a single working copy of its data, so single node failures in the surviving zone will prevent read and write access, and, if the failure is a permanent one, may produce permanent data loss.

If the problems that led to the failover have been corrected and the original data from the previously failed nodes (Manhattan) is still available, you can return the old nodes to service by performing a switchover. To do this, see the next section.