Managing Failover for the Management Instances
You conduct all management activity from a single management instance, called the active management instance. However, it is highly recommended that you configure two management instances, where the standby management instance is available in case the active management instance goes down or fails.
-
If you only have a single management instance and it goes down, the databases remain operational. However, most management operations are unavailable until the management instance is restored.
-
If you configure both the active and standby management instances in your grid and only the active management instance is active, then you can configure and manage the entire grid from this one management instance.
If both management instances are down, then:
-
You can still access all databases in the grid. However, since all management actions are requested through the active management instance, you cannot manage your grid until the active management instance is restored.
-
If data instances or their elements in the grid go down or fail, they cannot recover, restart or rejoin the grid until the active management instance is restored.
Note:
You cannot add a third management instance.
As shown in Figure 13-6, all management information used by the active management instance is automatically replicated to the standby management instance. Thus, if the active management instance goes down or fails, you can promote the standby management instance to become the new active management instance through which you continue to manage the grid.
Figure 13-6 Active Standby Configuration for Management Instances
Description of "Figure 13-6 Active Standby Configuration for Management Instances"
The following sections describes how you can manage the management instances:
Status for Management Instances
You use the ttGridAdmin mgmtExamine
command for both the status for the management instances and to see if there are any issues that need to be resolved. This command recommends any corrective actions you can run to fix any open issues, if necessary.
The following example shows both management instances working:
% ttGridAdmin mgmtExamine
Both active and standby management instances are up. No action required.
Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive
------------------------------------------------------------------------
host1 instance1 Yes Active Active 598 Up Yes
host2 instance1 Yes Standby Standby 598 Up No
If one of the management instances goes down or fails, the output shows that the management instance role is Unknown
and a message states that its replication agent is down. The output provides recommended commands to restart the management instance.
% ttGridAdmin mgmtExamine
Active management instance is up, but standby is down
Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message
----- --------- --------- ------------- ---------- --- -------- --------- --------
host1 instance1 Yes Active Active 600 Up No
host2 instance1 No Unknown Unknown Down No Management
database is not available
Recommended commands:
ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com
/timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart
For each management instance displayed:
-
Host and Instance show the name of the management instance and the name of the host where it is located.
-
Reachable indicates whether the command was successful in reaching the management instance to determine its state.
-
RepRole(Self) indicates the recorded role, if any, known by the replication agents for replicating data between management instances. While Role(Self) indicates the recorded role known within the database for the management instances. Both of these should show the same role. If the roles are different, the
ttGridAdmin mgmtExamine
command will try to determine the commands that would rectify the error. -
Seq is the sequence number of the most recent change on the management instance. If the
Seq
values are the same, then the two management instances are synchronized; otherwise, the one with the largerSeq
value has the more recent data. -
RepAgent indicates whether a replication agent is running on each management instance.
-
RepActive indicates whether changes by the
ttGridAdmin mgmtStatus
command, which is invoked internally by thettGridAdmin mgmtExamine
command, to management data on the management instance were successful. -
Message provides any further information about the management instance.
See Examine Management Instances (mgmtExamine) in Oracle TimesTen In-Memory Database Reference.
Starting, Stopping and Switching Management Instances
You run most ttGridAdmin
commands on the active management instance. However, when you manage recovery for an active management instance, you may be required to run ttGridAdmin
commands on the standby management instance.
When starting, stopping, or promoting a standby management instance:
-
You can run the
ttGridAdmin mgmtStandbyStop
command on either management instance. The grid knows where the standby management instance is and stops it. -
You must run the
ttGridAdmin mgmtStandbyStart
command on the management instance that you wish to become the standby management instance. ThettGridAdmin mgmtStandbyStart
command assumes that you want the current instance to become the standby management instance. -
If the active management instance is down, you must run the
ttGridAdmin mgmtActiveSwitch
command on the standby management instance to promote it to be the active management instance.
For those commands that require you to run commands on the standby management instance, remember to set the environment with the ttenv
script (as described in Creating the Initial Management Instance) after you log onto the host and before you run the ttGridAdmin
utility.
Single Management Instance Failure
While it is not recommended, you can manage the grid with a single active management instance with no standby management instance. If the single active management instance fails and recovers, re-activate the active management instance as follows:
Active Management Instance Failure
If the active management instance fails, then you can no longer run ttGridAdmin
commands on it.
-
Promote the standby management instance on the
host2
host to be the new active management instance. -
Create a new standby management instance by either:
-
Recovering the failed management instance on
host1
up as the new standby management instance. This causes the new active management instance to replicate all management information to the new standby management instance. -
Deleting the failed active management instance if the failed management instance has permanently failed, then creating a new standby management instance.
-
For example, your environment has two management instances where the active management instance is on host1
and the standby management instance is on host2
. Then, if the active management instance on host1
fails, then you can no longer run ttGridAdmin
commands on it. As shown in Figure 13-7, you must promote the standby management instance on host2
to become the new active management instance.
Once the new active management instance is processing requests, ensure that a new standby management instance is created by one of the following methods:
Failed Management Instance Can Be Recovered
If the failed active management instance can be recovered, you need to perform the following tasks:
Figure 13-8 The Failed Management Instance Can Be Recovered
Description of "Figure 13-8 The Failed Management Instance Can Be Recovered"
Failed Management Instance Encounters a Permanent Failure
If the failed active management instance has failed permanently, you need to perform the following tasks:
Figure 13-9 The Active Management Instance Fails Permanently
Description of "Figure 13-9 The Active Management Instance Fails Permanently"
Standby Management Instance Failure
How you re-activate the standby management instance depends on the type of failure as described in the following sections:
Standby Management Instance Experiences Permanent Failure
If the standby management instance has permanently failed, perform the following commands:
-
Delete the failed standby management instance on the
host2
host. -
Create a new standby management instance on the
host9
host to take over the duties of the failed standby management instance. Then, the active management instance replicates the management information to the new standby management instance.
Figure 13-10 The Standby Management Instance Fails Permanently
Description of "Figure 13-10 The Standby Management Instance Fails Permanently"
Both Management Instances Fail
You must restart the management instances to return the grid to its full functionality and to be able to manage the grid through the active management instance.
If both of the management instances are down, you need to discover which management instance has the latest changes on it to decide which management instance is to become the new active management instance.
Note:
If both management instances fail permanently, call Oracle Support.
The following describes the methods to perform when both management instances are down:
Bring Back Both Management Instances
If you can bring back both management instances:
Note:
If you have not done so already, set the environment with the ttenv
script (as described in Creating the Initial Management Instance).
-
Run the
ttGridAdmin mgmtExamine
command on one of the management instances to discover which is the appropriate one to become the active management instance. ThettGridAdmin mgmtExamine
command evaluates both management instances and prints out the highest sequence number for the management instance that has more management data. It is this management instance that should be re-activated as the active management instance.% ttGridAdmin mgmtExamine One or more management instance is down. Start them and run mgmtExamine again. Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message ------------------------------------------------------------------------------ host1 instance1 No Unknown Unknown Down No Management database is not available host2 instance1 No Unknown Unknown Down No Management database is not available Recommended commands: ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -start -force ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -start -force sleep 30 /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtExamine
-
Run the recommended commands listed by the
ttGridAdmin mgmtExamine
command. The commands for this example result in restarting the daemons for each management instance:% ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -start -force TimesTen Daemon (PID: 3858, port: 11000) startup OK. % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -start -force TimesTen Daemon (PID: 4052, port: 12000) startup OK.
-
Re-run the
ttGridAdmin mgmtExamine
command to verify that both management instances are up. If either of the management instances are not up, then thettGridAdmin mgmtExamine
command may suggest another set of commands to run.In this example, the second invocation of the
ttGridAdmin mgmtExamine
command shows that the management instances are not up. Thus, this example shows that the command next requests that you:-
Stop the main daemon of the data instance for both management instances.
-
Run the
ttGridAdmin mgmtActiveStart
command on the management instance with the higher sequence number provided by thettGridAdmin mgmtExamine
command. This re-activates the active management instance. -
Run the
ttGridAdmin mgmtStandbyStart
command on the management instance that you want to act as the standby management instance. This command assigns the other management instance as the standby management instance in TimesTen Scaleout, initiates the active standby configuration between the active and standby management instances and synchronizes the management information on the active management instance to the standby management instance.
% ttGridAdmin mgmtExamine Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message ------------------------------------------------------------------------ host1 instance1 Yes Active Active 581 Down No host2 instance1 Yes Standby Standby 567 Down No Recommended commands: ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -stop ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -stop sleep 30 ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtActiveStart sleep 30 ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart
Executing these commands restarts both the active and standby management instances:
% ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttDaemonAdmin -stop TimesTen Daemon (PID: 3858, port: 11000) stopped. % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttDaemonAdmin -stop TimesTen Daemon (PID: 3859, port: 12000) stopped. % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host1.example.com /timesten/host1/instance1/bin/ttenv ttGridAdmin mgmtActiveStart This management instance is now the active % ssh -o StrictHostKeyChecking=yes -o PasswordAuthentication=no -x host2.example.com /timesten/host2/instance1/bin/ttenv ttGridAdmin mgmtStandbyStart Standby management instance started
Continue to re-run the
ttGridAdmin mgmtExamine
command until you receive the message that both management instances are up.% ttGridAdmin mgmtExamine Both active and standby management instances are up. No action required. Host Instance Reachable RepRole(Self) Role(Self) Seq RepAgent RepActive Message ---------------------------------------------------------------------- host1 instance1 Yes Active Active 567 Up Yes host2 instance1 Yes Standby Standby 567 Up No
-
Bring Back One of the Management Instances
As soon as you notice that your standby management instance is down, it is important that you recreate it as soon as possible. If not, then your grid topology may be dramatically different than it was before if your active management instance also goes down. That is, if the active management instance goes down or fails in such a way that the best option is to bring back up the standby management instance that has been down for a while, then this may result in an incorrect grid topology as follows:
-
If you had recently added instances to your grid, they may be gone.
-
If you had recently deleted instances from your grid, they may be back.
-
If you had recently created databases, they may have been deleted.
-
If you had recently destroyed databases, they might be recreated.
If you can bring back only one of the management instances, re-activate this instance as the active management instance. The following example assumes that the management instance on the host2
host is down and the management instance on the host1
host was able to be brought back.