Performing a Switchover

To continue from the example of the previous section, after performing the failover, you can return the old nodes to service by performing the following switchover procedure:

  1. After the failed zones are repaired, restart all the Storage Nodes of the failed zones without starting any services (avoids hard rollback):

    java -Xmx64m -Xms64m \
    -jar KVHOME/lib/kvstore.jar restart -disable-services \
    -root nyc1/KVROOT &

    Note:

    When performing planned maintenance, there is no need to isolate nodes or disable services prior to bringing nodes back online.

  2. Reestablish network connectivity or reenable the standard startup sequence of the previously failed zones.

  3. Repair the topology so that the topology for the newly restarted Storage Nodes can be updated with changes made by the failover.

    java -Xmx64m -Xms64m -jar KVHOME/lib/kvstore.jar runadmin \
    -host jersey1 -port 5000 \
    -security USER/security/admin.security
    
    kv-> plan repair-topology -wait
    Executed plan 10, waiting for completion...
    Plan 10 ended successfully 

    Note:

    This assumes that you must have followed the steps as mentioned in Create users and configure security with remote access.

    Note:

    This command will also restart services on the previously failed nodes.

    Use the verify configuration command to confirm that there are no configuration problems.

  4. Run the ping command. The "maxCatchupTimeSecs" value will be used for the -timeout flag of the await-consistency command.

    Use the timeout flag to specify an estimate of how long the switchover will take. For example, if the nodes have been offline for a long time it might take many hours for them to catch up so that they can be converted back to primary nodes.

    kv-> ping
    Pinging components of store mystore based upon topology sequence #117
    100 partitions and 6 storage nodes
    Time: 2022-06-09 07:39:18 UTC   Version: 21.3.10
    Shard Status: healthy: 1 writable-degraded: 0 read-only: 0 offline: 0 total: 1
    Admin Status: healthy
    Zone [name=Manhattan id=zn1 type=SECONDARY allowArbiters=false masterAffinity=false]   
    RN Status: online: 3 read-only: 0 offline: 0 maxDelayMillis: 3 maxCatchupTimeSecs: 0
    Zone [name=JerseyCity id=zn2 type=PRIMARY allowArbiters=false masterAffinity=false]   
    RN Status: online: 3 read-only: 0 offline: 0 maxDelayMillis: 4 maxCatchupTimeSecs: 0
    Storage Node [sn1] on nyc1: 5000    Zone: name=Manhattan id=zn1 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   
    serviceStartTime: 2022-06-09 07:36:01 UTC
    Admin [admin1]  Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:38:14 UTC       
    stateChangeTime: 2022-06-09 07:38:14 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn1] Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5111 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:14 UTC    
    stateChangeTime: 2022-06-09 07:37:20 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn2] on nyc1: 5100    Zone: [name=Manhattan id=zn1 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise  isMasterBalanced: true   
    serviceStartTime: 2022-06-09 07:36:25 UTC
    Admin [admin2]  Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:38:34 UTC       
    stateChangeTime: 2022-06-09 07:38:33 UTC availableStorageSize: 2 GB
    Rep Node [rg1-rn2]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5211 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:28 UTC    
    stateChangeTime: 2022-06-09 07:37:33 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn3] on nyc1: 5200    Zone: [name=Manhattan id=zn1 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   
    serviceStartTime: 2022-06-09 07:36:35 UTC
    Admin [admin3]  Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:38:56 UTC       
    stateChangeTime: 2022-06-09 07:38:56 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn3] Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5311 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:43 UTC    
    stateChangeTime: 2022-06-09 07:37:49 UTC delayMillis: 3 catchupTimeSecs: 0
    Storage Node [sn4] on jersey1: 6000  Zone: [name=JerseyCity id=zn2 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:05:44 UTC
    Admin [admin4]  Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:36:49 UTC       
    stateChangeTime: 2022-06-09 07:36:47 UTC availableStorageSize: 2 GB
    Rep Node [rg1-rn4]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5411 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:59 UTC delayMillis: 4 catchupTimeSecs: 0
    Storage Node [sn5] on jersey1: 6100    Zone: [name=JerseyCity id=zn2 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise  isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:05:54 UTC
    Admin [admin5] Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:36:49 UTC
    stateChangeTime: 2022-06-09 07:36:48 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn5]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5511 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:59 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn6] on jersey1: 6200    Zone: [name=JerseyCity id=zn2 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise  isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:06:03 UTC
    Admin [admin6]  Status: RUNNING,MASTER  serviceStartTime: 2022-06-09 07:36:55 UTC       
    stateChangeTime: 2022-06-09 07:36:46 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn6] Status: RUNNING,MASTER sequenceNumber: 2,672 haPort: 5611 
    availableStorageSize: 273 GB storageType: HD  serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:57 UTC
    In this case, 1800 seconds (30 minutes) is the value to be used.
  5. Use the await-consistency command to specify the wait time (1800 seconds) used for the secondary zones to catch up with their masters.

    The system will only wait five minutes for nodes to catch up when attempting to change a zone's type. If the nodes do not catch up in that amount of time, the plan will fail.

    If the nodes will take more than five minutes to catch up, you should run the await-consistency command, specifying a longer wait time using the -timeout flag. In this case, the wait time (1800 seconds) is used:

    kv-> await-consistent -timeout 1800 -znname Manhattan
    The specified zone is consistent 

    By default, nodes need to have a delay of no more than 1 second to be considered caught up. You can change this value by specifying the -replica-delay-threshold flag. You should do this if network delays prevent the nodes from catching up within 1 second of their masters.

    Note:

    If you do not want the switchover to wait for the nodes to catch up, you can use the -no-replica-delay threshold flag. In that case, nodes will be converted to primary nodes even if they are behind. You should evaluate whether this risk is worth taking.

  6. Perform the switchover to convert the previously failed zone back to a primary zone , and the formerly secondary zone back to its earlier state.

    kv-> topology clone -current -name newTopo
    kv-> topology change-zone-type -name newTopo \
    -znname Manhattan -type primary
    Changed zone type of zn1 to PRIMARY in newTopo
    
    kv-> topology change-zone-type -name newTopo \
    -znname JerseyCity -type secondary
    Changed zone type of zn2 to SECONDARY in newTop
    
    kv-> plan deploy-topology -name newTopo -wait
    Executed plan 11, waiting for completion...
    Plan 11 ended successfully 

    Confirm the zone type change of the Manhattan zone to PRIMARY by running the ping command.

    kv-> ping
    Pinging components of store mystore based upon topology sequence #117
    100 partitions and 6 storage nodes
    Time: 2022-06-09 07:39:18 UTC   Version: 21.3.10
    Shard Status: healthy: 1 writable-degraded: 0 read-only: 0 offline: 0 total: 1
    Admin Status: healthy
    Zone [name=Manhattan id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]   
    RN Status: online: 3 read-only: 0 offline: 0 maxDelayMillis: 3 maxCatchupTimeSecs: 0
    Zone [name=JerseyCity id=zn2 type=SECONDARY allowArbiters=false masterAffinity=false]   
    RN Status: online: 3 read-only: 0 offline: 0 maxDelayMillis: 4 maxCatchupTimeSecs: 0
    Storage Node [sn1] on nyc1: 5000    Zone: name=Manhattan id=zn1 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true   
    serviceStartTime: 2022-06-09 07:36:01 UTC
    Admin [admin1]  Status: RUNNING,MASTER serviceStartTime: 2022-06-09 07:38:14 UTC
    stateChangeTime: 2022-06-09 07:38:14 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn1]  Status: RUNNING,MASTER sequenceNumber: 2,672 haPort: 5111 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:14 UTC    
    stateChangeTime: 2022-06-09 07:37:20 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn2] on nyc1: 5100  Zone: [name=Manhattan id=zn1 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING  Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise  isMasterBalanced: true   
    serviceStartTime: 2022-06-09 07:36:25 UTC
    Admin [admin2]   Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:38:34 UTC       
    stateChangeTime: 2022-06-09 07:38:33 UTC availableStorageSize: 2 GB
    Rep Node [rg1-rn2]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5211 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:28 UTC    
    stateChangeTime: 2022-06-09 07:37:33 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn3] on nyc1: 5200    Zone: [name=Manhattan id=zn1 type=PRIMARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise    isMasterBalanced: true  
     serviceStartTime: 2022-06-09 07:36:35 UTC
    Admin [admin3]        Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:38:56 UTC       
    stateChangeTime: 2022-06-09 07:38:56 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn3     Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5311 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:37:43 UTC    
    stateChangeTime: 2022-06-09 07:37:49 UTC delayMillis: 3 catchupTimeSecs: 0
    Storage Node [sn4] on jersey1: 6000    Zone: [name=JerseyCity id=zn2 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:05:44 UTC
    Admin [admin4]  Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:36:49 UTC       
    stateChangeTime: 2022-06-09 07:36:47 UTC availableStorageSize: 2 GB
    Rep Node [rg1-rn4]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5411 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:59 UTC delayMillis: 4 catchupTimeSecs: 0
    Storage Node [sn5] on jersey1: 6100 Zone: [name=JerseyCity id=zn2 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:05:54 UTC
    Admin [admin5]    Status: RUNNING,REPLICA serviceStartTime: 2022-06-09 07:36:49 UTC       
    stateChangeTime: 2022-06-09 07:36:48 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn5]   Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5511 
    availableStorageSize: 273 GB storageType: HD serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:59 UTC delayMillis: 0 catchupTimeSecs: 0
    Storage Node [sn6] on jersey1: 6200 Zone: [name=JerseyCity id=zn2 type=SECONDARY 
    allowArbiters=false masterAffinity=false]    
    Status: RUNNING   Ver: 21.3.10 2021-12-21 21:24:59 UTC  
    Build id: 78bbc4cb976b Edition: Enterprise  isMasterBalanced: true     
    serviceStartTime: 2022-06-09 07:06:03 UTC
    Admin [admin6]   Status: RUNNING,REPLICA  serviceStartTime: 2022-06-09 07:36:55 UTC       
    stateChangeTime: 2022-06-09 07:36:46 UTC  availableStorageSize: 2 GB
    Rep Node [rg1-rn6]  Status: RUNNING,REPLICA sequenceNumber: 2,672 haPort: 5611 
    availableStorageSize: 273 GB storageType: HD  serviceStartTime: 2022-06-09 07:36:36 UTC    
    stateChangeTime: 2022-06-09 07:36:57 UTC