Simulate Replica Set Failure with Restart

Let's simulate a replica set failure. In this example, a TimesTenScaleout object's .spec.ttspec.replicaSetRecovery datum has not been specified. The default of Restart is assumed, indicating that the TimesTen Operator forcibly unloads and reloads the database when a total replica set failure occurs.

Let's observe how a TimesTenScaleout object transitions through various state changes.

Note:

This example is for demonstration purposes only. Do not attempt this example in a production environment.
In the example, there is a deployed TimesTenScaleout object that is functioning properly.
kubectl get tts samplescaleout
NAME             OVERALL   MGMT     CREATE    LOAD              OPEN   AGE
samplescaleout   Normal    Normal   created   loaded-complete   open   99m

Note the High Level state is Normal, the management state is Normal, and the database state is created,loaded-complete,open.

In this example, there are three replica sets.
kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
samplescaleout-data-1-0              2/2     Running   0          11m
samplescaleout-data-1-1              2/2     Running   0          11m
samplescaleout-data-1-2              2/2     Running   0          11m
samplescaleout-data-2-0              2/2     Running   0          11m
samplescaleout-data-2-1              2/2     Running   0          11m
samplescaleout-data-2-2              2/2     Running   0          11m
samplescaleout-mgmt-0                2/2     Running   0          11m
samplescaleout-zk-0                  1/1     Running   0          11m
samplescaleout-zk-1                  1/1     Running   0          10m
samplescaleout-zk-2                  1/1     Running   0          9m35s
timesten-operator-7677964df9-sp2zp   1/1     Running   0          7d3h

Let's delete the samplescaleout-data-1-0 and samplescaleout-data-2-0 Pods that belong to one of the replica sets.

  1. Delete the Pods.
    kubectl delete pod samplescaleout-data-1-0;kubectl delete pod samplescaleout-data-2-0
    pod "samplescaleout-data-1-0" deleted
    pod "samplescaleout-data-2-0" deleted
    
  2. Use the kubectl get command to observe state transitions.
    kubectl get tts samplescaleout
    NAME             OVERALL           MGMT     CREATE    LOAD                OPEN   AGE
    samplescaleout   DatabasePartial   Normal   created   loaded-incomplete   open   111m

    The High Level state is DatabasePartial indicating that the database is up, but some data is not available. One or more replica sets have failed completely. The database loaded state is loaded-incomplete, indicating that at least one replica set has no elements that finished loading successfully.

    kubectl get tts samplescaleout
    NAME             OVERALL                   MGMT     CREATE    LOAD                 OPEN   AGE
    samplescaleout   DatabaseRestartRequired   Normal   created   loading-incomplete   open   112m

    The object transitions to the DatabaseRestartRequired High Level state. This state indicates that while the database is up (at least partially), the database must be stopped and restarted (unloaded and reloaded) in order to restore functionality. This can occur when all elements in a replica set fail simultaneously and such elements are unloadable due to a waiting for seed condition. When this happens the database must be unloaded and reloaded. At that time, committed transactions may be lost.

    kubectl get tts samplescaleout
    NAME             OVERALL              MGMT     CREATE    LOAD                 OPEN   AGE
    samplescaleout   DatabaseRestarting   Normal   created   loading-incomplete   open   114m

    The object transitions to the DatabaseRestarting High Level state. This state indicates that the database is being forcibly unloaded and reloaded after a DatabaseRestartRequired condition.

    kubectl get tts samplescaleout
    NAME             OVERALL   MGMT     CREATE    LOAD              OPEN     AGE
    samplescaleout   Normal    Normal   created   loaded-complete   closed   114m

    The object transitions to the Normal High Level state, indicating that the grid and database are functioning normally. The database loaded state changed to loaded-complete, indicating every element was loaded successfully. The database open state is closed, indicating the database is closed for connections.

    kubectl get tts samplescaleout
    NAME             OVERALL   MGMT     CREATE    LOAD              OPEN   AGE
    samplescaleout   Normal    Normal   created   loaded-complete   open   114m

    The object remains in the Normal High Level state. The database open state changed to open, indicating the database is now open for connections.

Even though there was a total replica set failure, TimesTen Scaleout forcibly unloaded and reloaded the database. The Operator returned the grid and database to a Normal state, indicating both are functioning normally. There was no manual intervention required.