29 Managing Switchover and Failover Operations

Outages can be planned as might be the case when performing upgrades or periodic maintenance, or unplanned as can happen in the event of hardware/software failure, or perhaps some environmental catastrophe. Regardless of the type of outage, you want to ensure that your IT infrastructure can be restored an running as soon as possible.

This chapter covers the following:

Switchover
Failover
Automatic Failover to the Standby Site in a Level 4 MAA Configuration

29.1 Switchover

Switchover is a planned activity where operations are transferred from the Primary site to a Standby site. This is usually done for testing and validation of Disaster Recovery (DR) scenarios and for planned maintenance activities on the primary infrastructure.

Switchover to the Passive OMS in a Level 2 Active/Passive Configuration

Switchover follows the same steps as Failover, See Section 'Failover to the Passive OMS in a Level 2 Active/Passive Configuration'

Switchover to the Standby Site in a Level 4 MAA Configuration

This section describes the steps to switchover to the standby site. The same procedure is applied to switchover in either direction.

Enterprise Manager Console cannot be used to perform switchover of the Management Repository database. Use the Data Guard Broker command line tool DGMGRL instead.

Prepare the Standby Database

Verify that recovery is up-to-date. Using the Enterprise Manager Console, you can view the value of the ApplyLag column for the standby database in the Standby Databases section of the Data Guard Overview Page.
Shut down the Primary Enterprise Manager Application Tier.

Shutdown all the Management Service instances in the primary site by running the following command on each Management Service:

emctl stop oms -all
Verify Software Library Availability

Ensure all files from the primary site are available on the standby site.
Switch over to the Standby Database

Use DGMGRL to perform a switchover to the standby database. The command can be run on the primary site or the standby site. The switchover command verifies the states of the primary database and the standby database, affects switchover of roles, restarts the old primary database, and sets it up as the new standby database.

SWITCHOVER TO <standby database name>;

Verify the post switchover states. To monitor a standby database completely, the user monitoring the database must have SYSDBA privileges. This privilege is required because the standby database is in a mounted-only state. A best practice is to ensure that the users monitoring the primary and standby databases have SYSDBA privileges for both databases.
```
SHOW CONFIGURATION;
SHOW DATABASE <primary database name>;
SHOW DATABASE <standby database name>;
```
Start the Admin Server if it is not already running.
```
emctl start oms -admin_only
```
Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.

emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
Sync up Users/Roles /Grants on the Stanby Management Service by running the following command:
```
emctl sync_opss_policy_store -sysman_pwd <sysman password>
```
Startup the Enterprise Manager Application Tier.

Startup all the Management Services on the standby site:

emctl start oms
Relocate Management Services and Management Repository target

The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

emctl config emrep -agent <agent name> -conn_desc
Switchover to Standby SLB.

Make appropriate network changes to failover your primary SLB to standby SLB that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).
Establish the old primary Management Services as the new standby Management Services to complete the switchover process.

Start the Administration Server on old primary site

emctl start oms -admin_only

Point the old primary Management Services to the new Primary Repository database by running the following command on each Management Service on the old primary site.

emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman

This completes the switchover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Repeat the same procedure to switchover in the other direction.

29.2 Failover

Failover to the Passive OMS in a Level 2 Active/Passive Configuration

Establish the IP address on failover host.
If the Database and Listener are also part of the same failover group:
1. Start the TNS listener using the command lsnrctl start.
2. Start the database using the command dbstart.
Start Cloud Control using the command emctl start oms.
Test the functionality.

Failover of the OMS in a Level 3 Active/Active Configuration

OMS failover is handled transparently in a Level 3, Active/Active OMS, configuration. The SLB monitors determine that the failed OMS is no longer available and route traffic to available OMS servers.

Manual Failover to the Standby Site in a Level 4 MAA Configuration

This section describes the steps to failover to a standby database, recover the Enterprise Manager application state by resynchronization the Management Repository database with all Management Agents, and enabling the original primary database as a standby using flashback database.

The word manual is used here to contrast this type of failover with a fast-start failover described later in Section 29.3, "Automatic Failover to the Standby Site in a Level 4 MAA Configuration".

Verify Software Library Availability

Ensure all files from the primary site are available on the standby site.
Failover to Standby Database.

Shutdown the database on the primary site. Use DGMGRL to connect to the standby database and execute the FAILOVER command:

FAILOVER TO <standby database name>;

Verify the post failover states:
```
SHOW CONFIGURATION;
SHOW DATABASE <primary database name>;
SHOW DATABASE <standby database name>;
```
Note that after the failover completes, the original primary database cannot be used as a standby database of the new primary database unless it is re-enabled.
Start the Admin Server if it is not already running.
```
emctl start oms -admin_only
```
Make the standby Management Services point to the Standby Database which is now the new Primary by running the following on each standby Management Service.
```
emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
```
Sync up Users/Roles/Grants on the Standby Management Service by running the following command:
```
emctl sync_opss_policy_store -sysman_pwd <sysman password>
```
Resync the New Primary Database with Management Agents.

Skip this step if you are running in Data Guard Maximum Protection or Maximum Availability level as there is no data loss on failover. However, if there is data loss, synchronize the new primary database with all Management Agents.On any one Management Service on the standby site, run the following command:

emctl resync repos -full -name "<name for recovery action>"

This command submits a resync job that would be executed on each Management Agent when the Management Services on the standby site are brought up.

Repository resynchronization is a resource intensive operation. A well tuned Management Repository will help significantly to complete the operation as quickly as possible. Specifically if you are not routinely coalescing the IOTs/indexes associated with Advanced Queueing tables as described in My Oracle Support note 271855.1, running the procedure before resync will significantly help the resync operation to complete faster.
Start up the Enterprise Manager Application Tier

Start up all the Management Services on the standby site by running the following command on each Management Service.

emctl start oms
Relocate Management Services and Management Repository target.

The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the standby site Management Service.

emctl config emrep -agent <agent name> -conn_desc
Switchover to the Standby SLB.

Make appropriate network changes to failover your primary SLB to the standby SLB, that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).
Establish Original Primary Database as Standby Database Using Flashback

Once access to the failed site is restored and if you had flashback database enabled, you can reinstate the original primary database as a physical standby of the new primary database.
1. Shut down all the Management Services in the original primary site.
  
  emctl stop oms -all
2. Restart the original primary database in mount state:
  
  shutdown immediate;
  
  startup mount;
3. Reinstate the Original Primary Database
  
  Use DGMGRL to connect to the old primary database and execute the REINSTATE command
  
  REINSTATE DATABASE <old primary database name>;
4. The newly reinstated standby database will begin serving as standby database to the new primary database.
5. Verify the post reinstate states.
  
  SHOW CONFIGURATION;
  
  SHOW DATABASE <primary database name>;
  
  SHOW DATABASE <standby database name>;
Establish Original Primary Management Service as the standby Management Service.

Start the Administration Server on old primary site

emctl start oms -admin_only

Point the old primary Management Service to the new Primary Repository database by running the following command on each Management Service on the old primary site.
```
emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman
```
Monitor and complete Repository Resynchronization

Navigate to the Management Services and Repository Overview page of Cloud Control Console. Under Related Links, click Repository Synchronization. This page shows the progress of the resynchronization operation on a per Management Agent basis. Monitor the progress.

Operations that fail should be resubmitted manually from this page after fixing the error mentioned. Typically, communication related errors are caused by Management Agents being down and can be fixed by resubmitting the operation from this page after restarting the Management Agent.

For Management Agents that cannot be started due to some reason, for example, old decommissioned Management Agents, the operation should be stopped manually from this page. Resynchronization is deemed complete when all the jobs have a completed or stopped status.

This completes the failover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Perform a switchover procedure if the site operations have to be moved back to the original primary site.

29.3 Automatic Failover to the Standby Site in a Level 4 MAA Configuration

This section details the steps to achieve complete automation of failure detection and failover procedure by utilizing Fast-Start Failover and Observer process. At a high level the process works as follows:

Fast-Start Failover (FSFO) determines that a failover is necessary and initiates a failover to the standby database automatically
When the database failover has completed the DB_ROLE_CHANGE database event is fired
The event causes a trigger to be fired which calls a script that configures and starts Enterprise Manager Application Tier

Perform the following steps:

Develop Enterprise Manager Application Tier Configuration and Startup Script

Develop a script that will automate the Enterprise Manager Application configuration and startup process. See the sample shipped with Cloud Control in the OH/sysman/ha directory. A sample script for the standby site is included here and should be customized as needed. Make sure ssh equivalence is setup so that remote shell scripts can be executed without password prompts. Place the script in a location accessible from the standby database host. Place a similar script on the primary site.

#!/bin/sh
# Script: /scratch/EMSBY_start.sh
# Primary Site Hosts
# Repos: earth, OMS: jupiter1, jupiter2
# Standby Site Hosts
# Repos: mars, # OMS: saturn1, saturn2
LOGFILE="/net/mars/em/failover/em_failover.log"
OMS_ORACLE_HOME="/scratch/OracleHomes/em/oms11"
CENTRAL_AGENT="saturn1.example.com:3872"
 
#log message
echo "###############################" >> $LOGFILE
date >> $LOGFILE
echo $OMS_ORACLE_HOME >> $LOGFILE
id >>  $LOGFILE 2>&1

#switch all OMS to point to new primary and startup all OMS
ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl sync_opss_policy_store -sysman_pwd <sysman password> " >> $LOGFILE 2>&1
ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1

#Repeat the following two lines for each OMS in a multiple OMS setup. E.g.
ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl config oms -store_repos_details -repos_conndesc <connect descriptor of new primary database> -repos_user sysman –repos_pwd <password>" >> $LOGFILE 2>&1
ssh orausr@saturn2 "$OMS_ORACLE_HOME/bin/emctl start oms" >>  $LOGFILE 2>&1

#relocate Management Services and Repository target
#to be done only once in a multiple OMS setup
#allow time for OMS to be fully initialized
ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl config emrep -agent $CENTRAL_AGENT -conn_desc -sysman_pwd <password>" >> $LOGFILE 2>&1

#always return 0 so that dbms scheduler job completes successfully
exit 0

Automate Execution of Script by Trigger

Create a database event "DB_ROLE_CHANGE" trigger, which fires after the database role changes from standby to primary. See the sample shipped with Cloud Control in OH/sysman/ha directory.

--
--
-- Sample database role change trigger
--
--
CREATE OR REPLACE TRIGGER FAILOVER_EM
AFTER DB_ROLE_CHANGE ON DATABASE
DECLARE
    v_db_unique_name varchar2(30);
    v_db_role varchar2(30);
BEGIN
    select upper(VALUE) into v_db_unique_name
    from v$parameter where NAME='db_unique_name';
    select database_role into v_db_role
    from v$database;
 
    if v_db_role = 'PRIMARY' then
 
      -- Submit job to Resync agents with repository
      -- Needed if running in maximum performance mode
      -- and there are chances of data-loss on failover
      -- Uncomment block below if required
      -- begin
      --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_SET_IDENTIFIER);
      --  SYSMAN.emd_maintenance.full_repository_resync('AUTO-FAILOVER to '||v_db_unique_name||' - '||systimestamp, true);
      --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_CLEAR_IDENTIFIER);
      -- end;
 
      -- Start the EM mid-tier
      dbms_scheduler.create_job(
          job_name=>'START_EM',
          job_type=>'executable',
          job_action=> '<location>' || v_db_unique_name|| '_start_oms.sh',
          enabled=>TRUE
      );
    end if;
EXCEPTION
WHEN OTHERS
THEN
    SYSMAN.mgmt_log.log_error('LOGGING', SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR,
SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR_M || 'EM_FAILOVER: ' ||SQLERRM);
END;
/

Note:

Based on your deployment, you might require additional steps to synchronize and automate the failover of SLB and shared storage used for software library. These steps are vendor specific and beyond the scope of this document. One possibility is to invoke these steps from the Enterprise Manager Application Tier startup and configuration script.

Configure Fast-Start Failover and Observer.

Use the Fast-Start Failover configuration wizard in Enterprise Manager Console to enable FSFO and configure the Observer.

This completes the setup of automatic failover.