This chapter describes Oracle Site Guard terminology and the architecture of a site in an Enterprise Manager Cloud Control Console. It also provides an overview of the workflow of different operations that Oracle Site Guard performs.
It contains the following topics:
The following terms are used in this document:
Targets are core Enterprise Manager entities that represent the infrastructure and business components in an enterprise. These components need to be monitored and managed for efficient functioning of the business. An example of a target is an Oracle Fusion Middleware farm or an Oracle Database Instance. Oracle Site Guard disaster-recovery operations are designed to protect one or more targets at a site.
A logical grouping of related entities in a data center. For example, software components in a Web tier, the Middleware tier, and Database tier, along with associated storage may all together comprise a Site. Oracle Site Guard performs disaster-recovery operations on a Site. A datacenter may have more than one Site defined by Oracle Site Guard and each of them can be managed independently for disaster-recovery operations.
The site currently hosting the active application (a set of targets) that Oracle Site Guard is configured to protect. The Primary Site is also referred to as the Production Site.
The site that is intended to host the protected application (a set of targets) in the event of a disaster-recovery operation.
The current designation of a site. The role can be either Primary or Standby.
The process of reversing the roles of the production site and standby site is termed as a switchover. Switchovers are planned operations done for periodic validation or to perform planned maintenance on the current production site. During a switchover, the current standby site becomes the new production site, and the current production site becomes the new standby site.
The process of making the current standby site the new production site after the production site becomes unexpectedly unavailable (for example, due to a disaster at the production site), is termed as a failover.
An operation plan contains the flow of execution for a particular Oracle Site Guard operation. It defines the order in which the steps of an disaster-recovery operation should be executed, in addition to other attributes, such as, serial, parallel, and so on.
Prechecks are a pre-ordered set of checks that determine whether an operation plan is compliant with the environment it is supposed to protect. Prechecks are used to assess disaster-recovery readiness, and are performed on demand.
A pre-ordered set of checks, health checks can be programmed to run periodically based on a user-defined schedule. Health checks are used to maintain an ongoing assessment of disaster-recovery readiness.
Custom Precheck Scripts
Custom Precheck scripts are user-defined scripts that are executed as part of the Precheck procedure for an Oracle Site Guard operation plan. The number of Precheck Scripts and the sequence of their execution can be defined as part of an operation plan.
Pre scripts are site-specific, user-defined scripts that are executed at a site at the beginning of an Oracle Site Guard operation. The number of Pre Scripts and the sequence of their execution can be defined as part of an operation plan.
Post scripts are site-specific, user-defined scripts that are executed at a site at the end of an Oracle Site Guard operation. The number of Post Scripts and the sequence of their execution can be defined as part of an operation plan.
Global Pre Scripts
Global Pre Scripts are operation-specific, user-defined scripts that are executed at the beginning of an Oracle Site Guard operation plan. The number of Global Pre Scripts and the sequence of their execution can be defined as part of an operation plan.
Global Post Scripts
Global Post Scripts are operation-specific, user-defined scripts that are executed at the end of an Oracle Site Guard operation plan. The number of Global Post Scripts and the sequence of their execution can be defined as part of an operation plan.
A super administrator is a privileged user who has access to all Enterprise Manager targets, and to all Oracle Site Guard configurations, operations, and activities.
A site is a logical grouping of software components and associated hardware that run one or more user applications. For example, a site could consist of a collection of servers (hosts) that are used to deploy Oracle Fusion Middleware instances, Oracle Fusion Application instances, Oracle databases, along with the associated storage for these software components. Oracle Site Guard uses the Enterprise Manager Cloud Control generic system target to represent a site. Every site, whether primary or standby, is represented as a Generic System, which is a collection of other target types, such as Oracle Database and Oracle Fusion Middleware Domain. Oracle Site Guard only supports Enterprise Manager deployments where both primary and standby sites are managed by the same Enterprise Manager Cloud Control deployment.
Figure 2-1 shows an overview of an Oracle Fusion Middleware Disaster Recovery topology managed by the same Enterprise Manager Cloud Control deployment.
Following are the key aspects of the Oracle Fusion Middleware Disaster Recovery topology:
A single Enterprise Manager Cloud Control instance monitors the primary site and the standby site.
Oracle Management Agent (EM Agent) is installed on local (non-replicated) storage on all hosts on the primary site and the standby site.
Web Tier managed system components (
Oracle Fusion Middleware Applications (
Oracle RAC Database (
RAC DBHOST1 and
Oracle Management Agent (EM Agent) is one of the core components of Enterprise Manager Cloud Control that enables you to convert an unmanaged host to a managed host in the Enterprise Manager system. The Management Agent works in conjunction with Enterprise Manager plug-ins to manage the targets running on that managed host.
This section describes the features of Oracle Site Guard.
It contains the following topics:
Oracle Site Guard provides the ability to extend the built-in disaster-recovery functionality by allowing you to insert custom scripts at specific points in the operation workflow. This provides a mechanism for performing customized, site-specific, or operation-specific activities during a disaster-recovery operation.
Any number of scripts can be configured for extensibility. The time and manner in which these user-defined scripts are executed and the sequence in which they are executed can be configured by selecting the script type.
This section contains the following topics:
For customizing and extending Oracle Site Guard functionality, the following types of scripts are available:
These scripts are provided by the user. They are used to perform user-defined activities during the Precheck or Health Check phase that occurs before an operation plan executes. Custom Precheck Scripts are executed as part of a Precheck or Health Check.
These scripts are provided by the user. They are used to perform user-defined activities at the beginning of site-specific operations in an operation plan. Pre Scripts are executed before Oracle Site Guard performs any target-related operations at a site.
These scripts are provided by the user. They are used to perform user-defined activities at the end of site-specific operations in an operation plan. Post scripts are executed after Oracle Site Guard performs any target-related operation at a site.
These scripts are provided by the user. They are used to perform user-defined operation-specific activities at the beginning of an operation plan. Global Pre Scripts are executed before Oracle Site Guard begins any operation at the first site (usually the primary site).
These scripts are provided by the user. They are used to perform user-defined operation-specific activities at the end of an operation plan. Global Post Scripts are executed after Oracle Site Guard has completed performing operations on the last site (usually a standby site).
These scripts come bundled with Oracle Site Guard and users can also define their own scripts. They are used to perform mount and un-mount operations on file systems during an operation. Unmount scripts are executed after all services and applications have been stopped at the primary site. Mount scripts are executed before any services or applications are started at the standby site.
These scripts come bundled with Oracle Site Guard. Users can also define their own storage scripts. These scripts are used to perform storage role-reversal activities for Oracle Sun ZFS Appliance during a disaster-recovery operation. Storage Switchover scripts are executed during a switchover operation and they execute at the standby site before any Mount scripts are executed. Storage Failover scripts are executed during a failover operation and they execute at the standby site before any Mount scripts are executed.
Table 2-1 provides an overview of the various types of scripts that are used while using Oracle Site Guard to set up sites.
|Types of Script||Provided by the User? (Custom Scripts)||Provided by Oracle Site Guard? (Bundled Scripts)|
Custom Precheck Script
Pre Script, Post Script, Global Pre Script, Global Post Script
Mount and Unmount Scripts
Yes (must be configured by user)
Storage Switchover and Storage Failover Scripts
Yes (Only for Oracle Sun ZFS. To be configured by user.)
Note:The optional scripts that are executed at the Primary site during a failover, are the same as that executed at the Primary site during a switchover operation. The scripts at the primary site are only executed as part of the failover operation if the user chooses to stop the Primary site during the failover.
Note:Custom Precheck scripts are scheduled to run on the Primary site for a Failover operation. But, since the Primary site might be inaccessible or non-operational, these scripts are set to run with a Continue on Error mode.
Depending on the type of script and the desired runtime behavior, you must configure the path of the script using the appropriate format. Oracle Site Guard determines the location (path) of the script using the configuration path and type of script provided by the user. Table 2-2 shows examples of how to configure the various types of scripts, the corresponding script path that the user needs to specify, and the component that is extracted and used by Oracle Site Guard. Script path formats, other than those listed in Table 2-2, are not supported.
|Script Location||Script Type||User Configured Path||Script Path Extracted by Oracle Site Guard|
Enterprise Manager Software Library
User defined (Custom)
The success of a disaster-recovery plan depends on how accurately the plan represents the environment it is supposed to protect. Topology changes and configuration drift in the protected site can cause the disaster-recovery operation plan to lose synchronization with the environment, and can render the plan partially or completely ineffective. Frequently, this divergence, between the disaster-recovery plan and the environment being protected, is not discovered until an actual disaster-recovery attempt is in progress. It is also very important to ensure that the standby site is ready to perform the production role, before initiating any disaster recovery operation.
Oracle Site Guard provides a solution to this problem with the Precheck and Health Check features.
A Precheck provides a convenient and fully automated mechanism for assessing disaster-recovery readiness on demand. A Precheck can be executed by itself (stand-alone mode) to check if a selected operation plan will succeed. It can also be invoked before an operation plan is executed. In the latter case, if the Precheck fails, the operation plan is not executed.
Health Checks are a special category of Prechecks. They are Prechecks that can be scheduled to run periodically. Thus, health checks provide a mechanism to perform an ongoing assessment of disaster-recovery readiness.
A health check must be configured for a specified operation plan and must have a user-specified schedule associated with it.
For example, you might set up a health check associated with the
Switchover to Standby Site plan to run every Wednesday and Saturday at 12:30 am to monitor the fidelity of that operation plan on an ongoing basis. You can also choose to be notified of health check results through e-mail.
Each configured operation plan can have an associated health check, and health checks for different plans execute independent of each other. You can stop health checks for an operation plan at any time
Oracle Site Guard performs the following checks during Prechecks and Health Checks:
Checks whether all the hosts involved in the planned disaster-recovery operation are reachable. During this check, Oracle Site Guard logs into each host using the credentials configured for that host. This ensures that the host is reachable and can be accessed for executing directives and scripts.
Checks whether the primary and standby databases are configured correctly and Data Guard protection is functioning correctly. This check verifies the following:
The primary and standby database names are correct.
The database login credentials are correct.
Data Guard broker is ready to switchover the database.
Database Flashback status is set to ON.
Data Guard Redo and Transport Lags are within the limits specified by the user.
Checks whether the ZFS storage replication is functioning correctly. This check verifies the following:
The replication lags are within the limits specified by the user.
The source and destination ZFS appliances are reachable.
The login credentials are valid.
The replication action is configured correctly.
Checks whether user scripts are configured correctly by verifying whether each configured user script is found at the correct location.
Checks whether replicated file systems can be mounted during a switchover or failover. To confirm this, the check verifies that the file system mount points exist and can be accessed for mount operations.
Checks whether the Data Guard and ZFS replication lag checks are within the bounds specified by the user.
Note:An associated Precheck is automatically created for every operation plan that is created. However, a health check must be explicitly scheduled for every operation plan.
The Precheck process can be customized by adding custom (user-defined) scripts that will execute as part of the Precheck, and also as part of any Health Checks that are then scheduled. This allows users to enhance the Precheck and Health Check capabilities of Oracle Site Guard by adding Prechecks for third-party components that need to be included in the disaster recovery workflow. Custom Precheck scripts function in the same way that built-in Prechecks function. If a user-defined Precheck script detects an anomaly and returns an error to Site Guard, that Precheck step is regarded as failed, and depending on how the Precheck script is configured (for example, if the script execution step is configured with the attribute Stop on Error), the disaster recovery operation may be aborted.
Disaster Recovery configurations typically include one or more storage appliances and data stores that are used for data storage by the application and database tiers. To make this data available at the standby site in the event of disaster recovery, these data stores are replicated from the primary to standby site, using either continuous or periodic replication. To perform a successful site switchover or failover, Oracle Site Guard must also perform storage role reversal as part of the disaster-recovery process.
The efficiency and timeliness of the data replication between the primary and standby sites is highly variable and depends on many factors, including network bandwidth, congestion, latency, storage appliance load, amount of replicated data, and so on. It is not uncommon for a certain amount of lag to be present between the source data at the primary site and the replicated data at the standby site. Oracle Site Guard provides a mechanism to configure the amount of replication lag that is permissible before a disaster-recovery operation can begin execution. During the Precheck phase of a disaster-recovery operation, Oracle Site Guard checks the current replication lag. If the lag exceeds the user-specified threshold, Oracle Site Guard does not execute the disaster-recovery operation.
You can configure the following lag-check parameters:
This parameter specifies the permissible lag for Redo Apply and Redo Transport which is managed by Oracle Data Guard.
This parameter specifies the permissible lag for application-tier storage replication which is managed by ZFS.
Storage-management operations are an essential part of disaster-recovery operations. During disaster recovery, storage replication must be reversed and storage appliances must be reconfigured before applications can be migrated to a standby site. Oracle Site Guard offers storage integration options for various storage technologies.
The following topics describe the storage integration options that Oracle Site Guard provides:
Oracle Site Guard provides built-in integration capabilities for Oracle Sun ZFS storage. If you are deploying Oracle Sun ZFS storage appliances, you can use the bundled storage management scripts (
zfs_storage_role_reversal.sh) provided by Oracle Site Guard to orchestrate Sun ZFS storage role reversal as part of Oracle Site Guard disaster-recovery operations.
Oracle Site Guard offers integration capabilities for other storage technologies by providing a script integration framework that allows you to incorporate your own custom storage management scripts into Oracle Site Guard operation plans. You can implement storage role reversal for third-party storage technologies by invoking your own custom storage management scripts during the storage script execution phase of the operation plan execution.
In addition to the capability for integrating storage management scripts, Oracle Site Guard also offers the capability for integrating user scripts for mounting and unmounting file systems. For example, during a switchover operation, file systems that are used by a multi-tier application are unmounted at the primary site after the application is stopped; and replicated versions of those file systems are then mounted at the standby site before the application is started. These unmount and mount operations for application servers at the primary and standby sites can be orchestrated using the built-in mechanism for integrating scripts. Oracle Site Guard provides a bundled script for file system mount and unmount operations called
mount_umount.sh. Alternately, you can define your own custom scripts that will be invoked at appropriate points in the operation plan.
When you execute an Oracle Site Guard operation plan, you can customize the plan before you execute it, monitor the execution of the plan, manage any errors you encounter during plan execution, and retry plan execution after making changes.
This section contains the following topics:
Oracle Site Guard operation plans can be customized according to the topology and environment. Each step in an operation plan can be customized by using the following parameters:
Specifying whether the step should be enabled or disabled for execution (disabled steps are skipped during execution)
Moving the step to another point in the execution sequence (for example, changing the order of managed servers to be brought up within a domain group)
Specifying how errors for a step need to be handled (that is, stopping or continuing the execution of an operation if an error is encountered)
Specifying whether the steps of a given group need to be executed serially or in parallel (for example, attempting to start up all the managed servers at the same time (in parallel), in a given domain group)
Oracle Site Guard disaster-recovery operations are executed as Oracle Enterprise Manager procedures, and the results of each operation can be monitored on the Procedure Activity page in Oracle Enterprise Manager Cloud Control Console. The procedure activity screen for a Oracle Site Guard operation displays each operation plan as a hierarchy of steps with a graphical icon showing the result of each step as it is executed. A green check mark is displayed if the step succeeds, or a red cross is displayed if the step fails.The icon, , indicates that the step was skipped and not configured for execution. This mechanism provides a visual summary of the progress of the operation plan.
When viewed in the Operation Activity page, the execution details for each operation plan or precheck are organized as a hierarchy of top-level steps with consequent sub-steps. Initially, only the top-level steps are visible to the user. The consequent sub-steps are collapsed and hidden within each top-level step. However, each top-level step in the operation activity can be further inspected in detail by clicking on the step to expand it, and navigating down into the hierarchy to select a constituent sub-step. The execution log for each sub-step can also be examined for additional details. This hierarchical organization of operation activity allows you to examine the results of the operation plan at any desired level of detail.
Each step in a Oracle Site Guard operation plan has an error mode an associated error mode that is configurable. This error mode defines how Oracle Site Guard handles any error that is encountered during the execution of that step.
The following error modes are available:
This mode specifies that Oracle Site Guard should stop executing the operation plan if it encounters an error while executing the current step.
This mode specifies that Oracle Site Guard should continue with the execution of the next step if it encounters an error while executing the current step.
If Oracle Site Guard encounters an error during an operation and stops the operation, you can resolve the issue that caused the failure, and then retry the failed operation. Oracle Site Guard resumes execution of the failed operation at the step where the failure occurred. You can also ignore the failed step, by clicking remove, and retry the operation. In this case, Oracle Site Guard will ignore the failed step, and resume execution of the operation plan starting with the step immediately following the failed step.
You can suspend the operation at any point in time, when an Oracle Site Guard operation is in progress. You can then resume the suspended operation and Oracle Site Guard will resume execution of the operation at the point where it was suspended. Additionally, you can also stop an operation that is currently in progress.
Note:Stopped operations cannot be resumed.
The following sections describe the comprehensive credential management framework that Oracle Site Guard offers:
Oracle Enterprise Manager provides a comprehensive Credential Management framework to manage identities and ensure that access to Enterprise Manager targets is authorized and authenticated. Typically, you can set up Named Credentials in Enterprise Manager before configuring Oracle Site Guard to use these credentials. After the credentials are configured, Oracle Site Guard uses them to access all managed targets at protected sites.
Depending on the topology of the site, Oracle Site Guard may need to use Named Credentials for different targets such as hosts, Oracle Database instances, WebLogic Servers, and other target types. For information about setting up credentials in Enterprise Manager, see "Setting Up Credentials" in Enterprise Manager Lifecycle Management Administrator's Guide.
After the required target credentials have been configured in Enterprise Manager's Credential Management framework, you can utilize these credentials during Oracle Site Guard's credential configuration process. Oracle Site Guard credential configuration requires that targets that are accessed and controlled by Oracle Site Guard for disaster-recovery operations, have valid credentials associated with the target. For information about setting up and associating credentials, see Section 4.3, "Creating Credential Associations".
Oracle Site Guard provides Role-Based Access Control (RBAC) using the User Accounts framework provided by Enterprise Manager. Enterprise Manager provides pre-configured roles for different areas or functions within Enterprise Manager. One of these administrator roles,
EM_SG_ADMINISTRATOR, is customized for Oracle Site Guard-focused activities within Enterprise Manager. You can utilize this built-in role to create users focused on Oracle Site Guard administration tasks. Alternately, you can create your own customized roles and users that allow for greater flexibility in tuning role-based access to Oracle Site Guard functionality.
For information about setting up role-based access control, see Section 3.2.2, "Creating Oracle Site Guard Administrator Users".
Oracle Site Guard includes built-in scripts (bundled scripts) for performing activities that are typically required while executing a disaster-recovery operation, such as, switching over an Oracle Database, or starting or stopping an Oracle Weblogic Server. These built-in scripts are included as part of the Enterprise Manager Software Library, and all required scripts are automatically deployed to the applicable hosts during operation execution. However, in addition to the built-in scripts, the user may require other custom scripts to be automatically deployed and executed as part of the operation. Oracle Site Guard provides a mechanism for users to upload their own custom scripts to the Enterprise Manager Software Library and add these scripts to the operation plan when the plan is created.
An additional advantage of using scripts that are part of the Enterprise Manager Software Library is that these scripts are automatically deployed to all configured script hosts at runtime. On the other hand, user scripts that are not part of the Enterprise Manager Software Library must be manually deployed on each configured script host before the operation plan begins execution.
For more information about the various types of scripts that a user can add to the Enterprise Manager Software Library, see Section 2.3.1, "Extensibility."
User-defined scripts that are either externally deployed or deployed through the Software Library are typically executed using the credentials configured for the host on which the script will execute. These credentials are configured and maintained in the Enterprise Manager credential management framework, and are referred to as the Host Normal Credentials or Host Privileged Credentials. However, you can also add other sets of credentials to the credential repository and configure a script to execute with this alternate set of credentials. This is useful in cases where the script requires credential privileges that are different from the standard (Host Normal) or privileged (Host Privileged) credentials configured for the script host. For example, a script that must be executed using a specific user ID to shut down a server process on that host.
User defined scripts frequently perform actions that require them to first authenticate with some other entity and they require one or more sets of credentials to perform this authentication. To avoid hard-coding credentials into the script or passing them insecurely as clear-text parameters to the script, Oracle Site Guard provides a mechanism to securely pass one or more sets of credentials to a configured script. These credentials are stored and maintained in a secure manner in Oracle Enterprise Manager's credential management framework. Once these credentials are configured and associated as parameters for the user script, Oracle Site Guard will encrypt and pass these credentials to the user script at execution time. The user script can then extract these credentials and use them for authentication.
For details about extracting encrypted credentials inside a user script, see Appendix A, "Extracting Credentials Passed as Parameters (Examples)."
Oracle Site Guard workflows, also referred to as operations, are modeled as Enterprise Manager deployment procedures.
When there is a failure or planned outage of the primary site, Oracle Site Guard automates the following steps to enable the standby site to assume the production role in the topology:
Stops the services and applications running on the primary site, and unmounts the storage on the primary site.
Stops storage replication from the primary site to the standby site, and performs storage role reversal.
Performs a failover or switchover of the Oracle Databases using Oracle Data Guard Broker.
Mounts the replicated storage (file systems) on the standby site.
Starts the services and applications on the standby site. At this point, the standby site assumes the production role.
Note:If continuous storage replication is not configured, Oracle recommends that you perform a final storage replication from the primary site to the standby site, before you initiate the Site Guard operation. However, if the primary site has failed, it may not be possible to perform this final replication.
Oracle Site Guard workflow can be monitored, suspended, resumed, and stopped, using Enterprise Manager's Procedure Management framework.
Oracle Site Guard provides the following distinct types of workflows for disaster-recovery operations:
The switchover workflow provides the ability to perform a controlled transition of the production activity from the primary site to a standby site. Figure 2-7 shows an example of the steps executed during a typical switchover operation.
The failover workflow provides the ability to perform a forced transition of production activity to a standby site. When a failover operation is launched, Oracle Site Guard assumes that the primary site is unavailable, and starts all protected applications at the standby site. Figure 2-8 shows an example of the steps executed during a typical failover operation:
The start workflow provides the ability to start production activities at a site. This workflow is typically used to bring up a site after maintenance, or to test whether the site can be started as part of testing a larger workflow such as a switchover. Figure 2-9 shows an example of the steps executed during a typical start operation.
The stop workflow provides the ability to stop production activities at a site. This workflow is typically used to bring down a site for maintenance, or to test whether the site can be stopped as part of testing a larger workflow such as a switchover. Figure 2-10 shows an example of the steps executed during a typical stop operation.