Managing Failover for the Management Instances

You conduct all management activity from a single management instance, called the active management instance. However, it is highly recommended that you configure two management instances, where the standby management instance is available in case the active management instance goes down or fails.

  • If you only have a single management instance and it goes down, the databases remain operational. However, most management operations are unavailable until the management instance is restored.

  • If you configure both the active and standby management instances in your grid and only the active management instance is active, then you can configure and manage the entire grid from this one management instance.

If both management instances are down, then:

  • You can still access all databases in the grid. However, since all management actions are requested through the active management instance, you cannot manage your grid until the active management instance is restored.

  • If data instances or their elements in the grid go down or fail, they cannot recover, restart or rejoin the grid until the active management instance is restored.

Note:

You cannot add a third management instance.

As shown in Figure 13-6, all management information used by the active management instance is automatically replicated to the standby management instance. Thus, if the active management instance goes down or fails, you can promote the standby management instance to become the new active management instance through which you continue to manage the grid.

Figure 13-6 Active Standby Configuration for Management Instances

Description of Figure 13-6 follows
Description of "Figure 13-6 Active Standby Configuration for Management Instances"

The following sections describes how you can manage the management instances:

Status for Management Instances

You use the ttGridAdmin mgmtExamine command for both the status for the management instances and to see if there are any issues that need to be resolved. This command recommends any corrective actions you can run to fix any open issues, if necessary.

The following example shows both management instances working:

% ttGridAdmin mgmtExamine
Both active and standby management instances are up. No action required.
 
Host  Instance  Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive 
------------------------------------------------------------------------
host1 instance1 Yes       Active        Active     598 Up       Yes
host2 instance1 Yes       Standby       Standby    598 Up       No

If one of the management instances goes down or fails, the output shows that the management instance role is Unknown and a message states that its replication agent is down. The output provides recommended commands to restart the management instance.

% ttGridAdmin mgmtExamine
Active management instance is up, but standby is down
 
Host  Instance  Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive  Message
----- --------- --------- ------------- ---------- --- -------- --------- --------
host1 instance1 Yes       Active        Active     600 Up       No        
host2 instance1 No        Unknown       Unknown        Down     No        Management
 database is not available

Recommended commands:
ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com
 /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart

For each management instance displayed:

  • Host and Instance show the name of the management instance and the name of the host where it is located.

  • Reachable indicates whether the command was successful in reaching the management instance to determine its state.

  • RepRole(Self) indicates the recorded role, if any, known by the replication agents for replicating data between management instances. While Role(Self) indicates the recorded role known within the database for the management instances. Both of these should show the same role. If the roles are different, the ttGridAdmin mgmtExamine command will try to determine the commands that would rectify the error.

  • Seq is the sequence number of the most recent change on the management instance. If the Seq values are the same, then the two management instances are synchronized; otherwise, the one with the larger Seq value has the more recent data.

  • RepAgent indicates whether a replication agent is running on each management instance.

  • RepActive indicates whether changes by the ttGridAdmin mgmtStatus command, which is invoked internally by the ttGridAdmin mgmtExamine command, to management data on the management instance were successful.

  • Message provides any further information about the management instance.

See Examine Management Instances (mgmtExamine) in Oracle TimesTen In-Memory Database Reference.

Starting, Stopping and Switching Management Instances

You run most ttGridAdmin commands on the active management instance. However, when you manage recovery for an active management instance, you may be required to run ttGridAdmin commands on the standby management instance.

When starting, stopping, or promoting a standby management instance:

  • You can run the ttGridAdmin mgmtStandbyStop command on either management instance. The grid knows where the standby management instance is and stops it.

  • You must run the ttGridAdmin mgmtStandbyStart command on the management instance that you wish to become the standby management instance. The ttGridAdmin mgmtStandbyStart command assumes that you want the current instance to become the standby management instance.

  • If the active management instance is down, you must run the ttGridAdmin mgmtActiveSwitch command on the standby management instance to promote it to be the active management instance.

For those commands that require you to run commands on the standby management instance, remember to set the environment with the ttenv script (as described in Creating the Initial Management Instance) after you log onto the host and before you run the ttGridAdmin utility.

Single Management Instance Failure

While it is not recommended, you can manage the grid with a single active management instance with no standby management instance. If the single active management instance fails and recovers, re-activate the active management instance as follows:

  1. Verify that there is only one management instance acting as the active management instance and that it has failed with the ttGridAdmin mgmtExamine command:
    % ttGridAdmin mgmtExamine
    The only defined management instance is down. Start it.
    Recommendation: define a second management instance
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive
    -------------------------------------------------------------------------
    host1 instance1 No      Unknown       Unknown    Down     No 
     
    Recommended commands:
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -start
  2. After determining the reason for the failure and resolving that issue, run the ttGridAdmin mgmtActiveStart command to re-activate the active management instance.
    % ttGridAdmin mgmtActiveStart
    This management instance is now the active
  3. Re-run the ttGridAdmin mgmtExamine command to verify that the active management instance is up. Follow any commands it displays if the management instance is not up.

Active Management Instance Failure

If the active management instance fails, then you can no longer run ttGridAdmin commands on it.

  • Promote the standby management instance on the host2 host to be the new active management instance.

  • Create a new standby management instance by either:

    • Recovering the failed management instance on host1 up as the new standby management instance. This causes the new active management instance to replicate all management information to the new standby management instance.

    • Deleting the failed active management instance if the failed management instance has permanently failed, then creating a new standby management instance.

Figure 13-7 Switch from a Failed Active

Description of Figure 13-7 follows
Description of "Figure 13-7 Switch from a Failed Active"

For example, your environment has two management instances where the active management instance is on host1 and the standby management instance is on host2. Then, if the active management instance on host1 fails, then you can no longer run ttGridAdmin commands on it. As shown in Figure 13-7, you must promote the standby management instance on host2 to become the new active management instance.

  1. Log in to the host2 host on which the standby management instance exists and set the environment with the ttenv script (as described in Creating the Initial Management Instance) on the host with the standby management instance.
  2. Run the ttGridAdmin mgmtActiveSwitch command on the standby management instance. TimesTen promotes the standby management instance into the new active management instance. You can now continue to manage your grid with the new active management instance.
    % ttGridAdmin mgmtActiveSwitch
    This is now the active management instance
  3. Verify that the old standby management instance is now the new active management instance with the ttGridAdmin mgmtExamine command:
    % ttGridAdmin mgmtExamine
    Active management instance is up, but standby is down
    
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive
    -------------------------------------------------------------------------
    host2 instance1 Yes     Active        Active     622 Up       Yes
    host1 instance1 No      Unknown       Unknown        Down     No
    Management database is not available
     
    Recommended commands:
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host1.example.com /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart

Once the new active management instance is processing requests, ensure that a new standby management instance is created by one of the following methods:

Failed Management Instance Can Be Recovered

If the failed active management instance can be recovered, you need to perform the following tasks:

Figure 13-8 The Failed Management Instance Can Be Recovered

Description of Figure 13-8 follows
Description of "Figure 13-8 The Failed Management Instance Can Be Recovered"
  1. If you can recover the failed management instance, as shown in Figure 13-8, then bring back up the failed host on which the old active management instance existed. Then, run the ttGridAdmin mgmtStandbyStart command on this host, which re-initiates the management instance as the new standby management instance. It also re-creates the active standby configuration between the new active and standby management instances and replicates all management information on the active management instance to the standby management instance.
    % ttGridAdmin mgmtStandbyStart
    Standby management instance started
  2. Verify that the active and standby management instances are as expected in their new roles with the ttGridAdmin mgmtExamine command:
    % ttGridAdmin mgmtExamine
    Both active and standby management instances are up. No action required.
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive
    -------------------------------------------------------------------------
    host2 instance1 Yes     Active        Active     603 Up       Yes
    host1 instance1 Yes     Standby       Standby    603 Up       No

Failed Management Instance Encounters a Permanent Failure

If the failed active management instance has failed permanently, you need to perform the following tasks:

Figure 13-9 The Active Management Instance Fails Permanently

Description of Figure 13-9 follows
Description of "Figure 13-9 The Active Management Instance Fails Permanently"
  1. Remove the permanently failed active management instance from the model with the ttGridAdmin instanceDelete command.
    % ttGridAdmin instanceDelete host1.instance1
    Instance instance1 on Host host1 deleted from Model

    Note:

    If there are no other instances on the host where the failed active management instance existed, you may want to delete the host and the installation.

  2. Add a new standby management instance with its supporting host and installation to the model.
    % ttGridAdmin hostCreate host9 -address host9.example.com 
    Host host9 created in Model
    % ttGridAdmin installationCreate -host host9 -location 
     /timesten/host9/installation1
    Installation installation1 on Host host9 created in Model
    % ttGridAdmin instanceCreate -host host9 -location /timesten/host9 
     -type management
    Instance instance1 on Host host9 created in Model
  3. Apply the configuration changes to remove the failed active management instance and add in a new standby management instance to the grid by executing the ttGridAdmin modelApply command.
    % ttGridAdmin modelApply
    Copying Model.........................................................OK
    Exporting Model Version 2.............................................OK
    Unconfiguring standby management instance.............................OK
    Marking objects 'Pending Deletion'....................................OK
    Stop any Instances that are 'Pending Deletion'........................OK
    Deleting any Instances that are 'Pending Deletion'....................OK
    Deleting any Hosts that are no longer in use..........................OK
    Verifying Installations...............................................OK
    Creating any missing Installations....................................OK
    Creating any missing Instances........................................OK
    Adding new Objects to Grid State......................................OK
    Configuring grid authentication.......................................OK
    Pushing new configuration files to each Instance......................OK
    Making Model Version 2 current........................................OK
    Making Model Version 3 writable.......................................OK
    Checking ssh connectivity of new Instances............................OK
    Starting new management instance......................................OK
    Configuring standby management instance...............................OK
    Starting new data instances...........................................OK
    ttGridAdmin modelApply complete

    The ttGridAdmin modelApply command initiates the active standby configuration between the active and standby management instances and replicates the management information on the active management instance to the standby management instance.

  4. Verify that the active and standby management instances are as expected in their new roles with the ttGridAdmin mgmtExamine command:
    % ttGridAdmin mgmtExamine
    Both active and standby management instances are up. No action required.
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive 
    -------------------------------------------------------------------------
    host2 instance1 Yes     Active        Active     603 Up       Yes
    host9 instance1 Yes     Standby       Standby    603 Up       No

Standby Management Instance Failure

How you re-activate the standby management instance depends on the type of failure as described in the following sections:

Standby Management Instance Recovers

If the standby management instance recovers, then:

  1. Check the status with the ttGridAdmin mgmtExamine command:
    % ttGridAdmin mgmtExamine
    Active management instance is up, but standby is down
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message
    -----------------------------------------------------------------------------
    host1 instance1 Yes     Active        Active     605 Up       No 
    host2 instance1 No      Unknown       Unknown        Down     No 
    Management database is not available
    
    Recommended commands:
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host2.example.com /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart
  2. Log into the host with the standby management instance. If you have not done so already, set the environment with the ttenv script (as described in Creating the Initial Management Instance).
  3. Once you bring the failed management instance back up, then run the ttGridAdmin mgmtStandbyStart command on the host with the standby management instance.
    % ttGridAdmin mgmtStandbyStart
    Standby management instance started

    This command re-integrates the standby management instance in your grid, initiates the active standby configuration between the active and standby management instances and replicates all management information on the active management instance to the standby management instance.

Standby Management Instance Experiences Permanent Failure

If the standby management instance has permanently failed, perform the following commands:

  • Delete the failed standby management instance on the host2 host.

  • Create a new standby management instance on the host9 host to take over the duties of the failed standby management instance. Then, the active management instance replicates the management information to the new standby management instance.

Figure 13-10 The Standby Management Instance Fails Permanently

Description of Figure 13-10 follows
Description of "Figure 13-10 The Standby Management Instance Fails Permanently"
  1. Remove the permanently failed standby management instance from the model with the ttGridAdmin instanceDelete command.
    % ttGridAdmin instanceDelete host2.instance1
    Instance instance1 on Host host2 deleted from Model

    Note:

    If there are no other instances on the host where the failed management instance existed, you may want to delete the host and the installation.

  2. Add a new standby management instance with its supporting host and installation to the model.
    % ttGridAdmin hostCreate host9 -address host9.example.com 
    Host host9 created in Model
    % ttGridAdmin installationCreate -host host9 -location  /timesten/host9/installation1
    Installation installation1 on Host host9 created in Model
    % ttGridAdmin instanceCreate -host host9 -location /timesten/host9  
    -type management
    Instance instance1 on Host host9 created in Model
  3. Apply the configuration changes to remove the failed standby management instance and add in a new standby management instance to the grid by executing the ttGridAdmin modelApply command, as shown in Applying the Changes Made to the Model.
    % ttGridAdmin modelApply
    Copying Model.........................................................OK
    Exporting Model Version 9.............................................OK
    Unconfiguring standby management instance.............................OK
    Marking objects 'Pending Deletion'....................................OK
    Stop any Instances that are 'Pending Deletion'........................OK
    Deleting any Instances that are 'Pending Deletion'....................OK
    Deleting any Hosts that are no longer in use..........................OK
    Verifying Installations...............................................OK
    Creating any missing Instances........................................OK
    Adding new Objects to Grid State......................................OK
    Configuring grid authentication.......................................OK
    Pushing new configuration files to each Instance......................OK
    Making Model Version 9 current........................................OK
    Making Model Version 10 writable......................................OK
    Checking ssh connectivity of new Instances............................OK
    Starting new management instance......................................OK
    Configuring standby management instance...............................OK
    Starting new data instances...........................................OK
    ttGridAdmin modelApply complete

    The ttGridAdmin modelApply command initiates the active standby configuration between the active and standby management instances and replicates the management information on the active management instance to the standby management instance.

Both Management Instances Fail

You must restart the management instances to return the grid to its full functionality and to be able to manage the grid through the active management instance.

If both of the management instances are down, you need to discover which management instance has the latest changes on it to decide which management instance is to become the new active management instance.

Note:

If both management instances fail permanently, call Oracle Support.

The following describes the methods to perform when both management instances are down:

Bring Back Both Management Instances

If you can bring back both management instances:

Note:

If you have not done so already, set the environment with the ttenv script (as described in Creating the Initial Management Instance).

  1. Run the ttGridAdmin mgmtExamine command on one of the management instances to discover which is the appropriate one to become the active management instance. The ttGridAdmin mgmtExamine command evaluates both management instances and prints out the highest sequence number for the management instance that has more management data. It is this management instance that should be re-activated as the active management instance.

    % ttGridAdmin mgmtExamine
    One or more management instance is down.
    Start them and run mgmtExamine again.
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message
    ------------------------------------------------------------------------------
    host1 instance1 No      Unknown       Unknown        Down     No 
    Management database is not available
    host2 instance1 No      Unknown       Unknown        Down     No 
    Management database is not available
    
    Recommended commands:
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -start -force
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -start -force
    sleep 30
    /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtExamine
  2. Run the recommended commands listed by the ttGridAdmin mgmtExamine command. The commands for this example result in restarting the daemons for each management instance:

    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -start -force
     
    TimesTen Daemon (PID: 3858, port: 11000) startup OK.
    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -start -force
    
    TimesTen Daemon (PID: 4052, port: 12000) startup OK.
  3. Re-run the ttGridAdmin mgmtExamine command to verify that both management instances are up. If either of the management instances are not up, then the ttGridAdmin mgmtExamine command may suggest another set of commands to run.

    In this example, the second invocation of the ttGridAdmin mgmtExamine command shows that the management instances are not up. Thus, this example shows that the command next requests that you:

    1. Stop the main daemon of the data instance for both management instances.

    2. Run the ttGridAdmin mgmtActiveStart command on the management instance with the higher sequence number provided by the ttGridAdmin mgmtExamine command. This re-activates the active management instance.

    3. Run the ttGridAdmin mgmtStandbyStart command on the management instance that you want to act as the standby management instance. This command assigns the other management instance as the standby management instance in TimesTen Scaleout, initiates the active standby configuration between the active and standby management instances and synchronizes the management information on the active management instance to the standby management instance.

    % ttGridAdmin mgmtExamine                                                  
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message
    ------------------------------------------------------------------------
    host1 instance1 Yes     Active        Active     581 Down     No
    host2 instance1 Yes     Standby       Standby    567 Down     No
    
    Recommended commands:
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -stop
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x
    host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -stop
    sleep 30
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host1.example.com /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtActiveStart
    sleep 30
    ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host2.example.com /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart

    Executing these commands restarts both the active and standby management instances:

    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -stop
    TimesTen Daemon (PID: 3858, port: 11000) stopped.
     
    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -stop
    TimesTen Daemon (PID: 3859, port: 12000) stopped.
    
    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host1.example.com /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtActiveStart
    This management instance is now the active
     
    % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x 
    host2.example.com /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart
    Standby management instance started

    Continue to re-run the ttGridAdmin mgmtExamine command until you receive the message that both management instances are up.

    % ttGridAdmin mgmtExamine
    Both active and standby management instances are up. No action required.
     
    Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message
    ----------------------------------------------------------------------
    host1 instance1 Yes     Active        Active     567 Up       Yes
    host2 instance1 Yes     Standby       Standby    567 Up       No

Bring Back One of the Management Instances

As soon as you notice that your standby management instance is down, it is important that you recreate it as soon as possible. If not, then your grid topology may be dramatically different than it was before if your active management instance also goes down. That is, if the active management instance goes down or fails in such a way that the best option is to bring back up the standby management instance that has been down for a while, then this may result in an incorrect grid topology as follows:

  • If you had recently added instances to your grid, they may be gone.

  • If you had recently deleted instances from your grid, they may be back.

  • If you had recently created databases, they may have been deleted.

  • If you had recently destroyed databases, they might be recreated.

If you can bring back only one of the management instances, re-activate this instance as the active management instance. The following example assumes that the management instance on the host2 host is down and the management instance on the host1 host was able to be brought back.

  1. Run the ttGridAdmin mgmtActiveStart command on the management instance on host1. This re-activates as the active management instance.
    % ttGridAdmin mgmtActiveStart
    This management instance is now the active
  2. Remove the permanently failed standby management instance from the model with the ttGridAdmin instanceDelete command.
    % ttGridAdmin instanceDelete host2.instance1
    Instance instance1 on Host host2 deleted from Model

    Note:

    If there are no other instances on the host where the down management instance existed, you may want to delete the host and the installation.

  3. Add a new standby management instance with its supporting host and installation to the model.
    % ttGridAdmin hostCreate host9 -address host9.example.com 
    Host host9 created in Model
    % ttGridAdmin installationCreate -host host9 -location  /timesten/host9/installation1
    Installation installation1 on Host host9 created in Model
    % ttGridAdmin instanceCreate -host host9 -location /timesten/host9 
    -type management
    Instance instance1 on Host host9 created in Model
    
  4. Apply the configuration changes to remove the failed standby management instance and add in a new standby management instance to the grid by executing the ttGridAdmin modelApply command.
    % ttGridAdmin modelApply
    Copying Model.........................................................OK
    Exporting Model Version 9.............................................OK
    Unconfiguring standby management instance.............................OK
    Marking objects 'Pending Deletion'....................................OK
    Stop any Instances that are 'Pending Deletion'........................OK
    Deleting any Instances that are 'Pending Deletion'....................OK
    Deleting any Hosts that are no longer in use..........................OK
    Verifying Installations...............................................OK
    Creating any missing Instances........................................OK
    Adding new Objects to Grid State......................................OK
    Configuring grid authentication.......................................OK
    Pushing new configuration files to each Instance......................OK
    Making Model Version 9 current........................................OK
    Making Model Version 10 writable......................................OK
    Checking ssh connectivity of new Instances............................OK
    Starting new management instance......................................OK
    Configuring standby management instance...............................OK
    Starting new data instances...........................................OK
    ttGridAdmin modelApply complete

    The ttGridAdmin modelApply command initiates the active standby configuration between the active and standby management instances and replicates the management information on the active management instance to the standby management instance.