5 Disaster Recovery

Each product has a different disaster recovery strategy. This chapter explains the disaster recovery processes for each product.

This chapter includes the following topics:

Kubernetes Cluster

Two independent Kubernetes cluster will exist, one in each site. These Kubernetes clusters need not be symmetrical. However, both clusters should have the spare capacity to run an application in the event of a failover or a switchover.

Each cluster is created and managed independently. Kubernetes scheduling will be responsible for ensuring that applications are started inside the cluster. There will be no external dependencies. All product disaster recovery solutions will be confined to processes running inside the cluster.

Oracle HTTP Server

The Oracle HTTP Server configuration is static. Therefore, the Oracle HTTP Server will be installed independently on the primary and standby sites.

The configuration changes will be applied to each site manually and independently.

Ingress

The Ingress controller configuration is static. Therefore, the Ingress controller will be installed independently on the primary and standby sites.

The configuration changes will be applied to each site manually and independently.

WebLogic Operator

The Oracle WebLogic Operator configuration is static. Therefore, the Operator will be installed independently on the primary and standby sites.

The configuration changes will be applied to each site manually and independently.

Oracle Unified Directory

In a traditional on-premise deployment, two independent Oracle Unified Directory (OUD) clusters will be set up on the primary and standby sites. Each site will be in a different replication group and the sites will be joined together by integrating both sites into the OUD replication agreement. This type of integration is not possible at this time in Kubernetes. The recommended approach in Kubernetes is to replicate the persistent volume from the primary site to the disaster recovery site. This is an active/passive solution.

Figure 5-1 OUD Disaster Recovery


Disaster Recovery for OUD

There are several ways of achieving the active/passive solution:

  • Disk based replication.
  • Process driven replication, where a backup copy of the instance is created locally, shipped to the standby system, and then re-applied on the remote system. This option may introduce a lag where some data may not be available on the standby system.

The overall process of setting up disaster recovery for OUD includes the following steps:

  1. Install OUD on the primary site using the standard procedures.
  2. Install OUD on the secondary site using the standard procedures.
  3. Delete the instance data from the standby site.
  4. Enable the replication of the persistent volume between the primary and standby sites.

For more information, see Configuring Disaster Recovery for Oracle Unified Directory.

For a switchover, complete the following steps:

  1. Stop the replication of the persistent volume.
  2. Shut down OUD on the primary site.
  3. Start up OUD on the standby site.
  4. Enable the replication of the persistent volume in the reverse direction.

For failover, complete the following steps:

  1. Stop the replication/application of the persistent volume.
  2. Shut down OUD on the primary site, if available.
  3. Start up OUD on the standby site.
  4. Enable the replication of the persistent volume in the reverse direction if the site is available, or recreate the primary site using the above steps.

Oracle Access Manager

Oracle Access Manager disaster recovery is achieved using the active/passive model. Currently, the use of Oracle Access Manager Multi-Datacenter is not supported in a Kubernetes environment.

Figure 5-2 OAM Disaster Recovery


Disaster Recovery for OAM

Considerations for the disaster recovery solution:

  • Every load balancer in the configuration (primary/standby) will contain the same SSL certificates.
  • The Oracle Access Manager application data is stored within an Oracle Database, which is then replicated to the standby site using Oracle Data Guard.
  • The configuration information (domain definitions) is replicated using the file system replication. This process involves using a replication tool such as 'rsync' on the persistent volumes. Because the configuration information changes rarely, it does not require frequent replication.
  • After the configuration information is replicated to the standby site, all database connections will be modified to point to the standby database at the standby site.

For more information, Configuring Disaster Recovery for Oracle Access Manager.

The overall process of setting up disaster recovery for OAM includes the following steps:

  1. Install OAM on the primary site using the standard procedures.
  2. Install OAM on the standby site using the standard procedures, or backup the Kubernetes objects from the primary Kubernetes cluster and restore it on the standby cluster.
  3. Modify the database service to be active only when the database is functioning in the roles of PRIMARY or SNAPSHOT STANDBY .
  4. Delete the configuration data from the standby site (if you created a new environment).
  5. Delete the database from the standby site.
  6. Enable the replication of the persistent volume between the primary and standby sites.
  7. Enable the replication of Data Guard between the primary and standby sites.
  8. Replicate the application database service to the standby database.
  9. Modify the database connection details on the standby site to use the standby database.

For a switchover, complete the following steps:

  1. Stop the replication of the persistent volume.
  2. Shut down OAM on the primary site.
  3. Switch over to the standby database.
  4. Start OAM on the standby site.
  5. Enable the persistent volume replication in the reverse direction.
  6. Enable the replication of Data Guard in the reverse direction.

For failover, complete the following steps:

  1. Stop the replication/application of the persistent volume.
  2. Shut down OAM on the primary site, if available.
  3. Activate/failover the standby database on the standby site.
  4. Start OAM on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction if the site is available, or recreate the primary site using the steps above.
  6. Enable the replication of Data Guard in the reverse direction if the site is available, or recreate the database on the old primary site using the above steps.

Oracle Identity Governance

Disaster recovery for Oracle Identity Governance (OIG) is achieved using the active/passive model.

Figure 5-3 OIG Disaster Recovery


Disaster Recovery for OIG

Considerations for the disaster recovery solution:

  • The OIG application data is stored within an Oracle database, which is then replicated to the standby site using Oracle Data Guard.
  • The configuration information (domain definitions) is replicated using the file system replication. This process involves using a replication tool such as 'rsync' on the persistent volumes. Because the configuration information changes rarely, it does not require frequent replication.
  • After the configuration information is replicated to the standby site, all database connections will be modified to point to the standby database at the standby site.

For more information, see Configuring Disaster Recovery for Oracle Identity Governance.

The overall process of setting up disaster recovery for OIG includes the following steps:

  1. Install OIG on the primary site using the standard procedures.
  2. Install OIG on the standby site using the standard procedures, or backup the Kubernetes objects from the primary Kubernetes cluster and restore it on the standby cluster.
  3. Modify the database service to be active only when the database is functioning in the roles of PRIMARY or SNAPSHOT STANDBY.
  4. Delete the configuration data from the standby site (if you created a new environment).
  5. Delete the database from the standby site.
  6. Enable the replication of the persistent volume between the primary and standby sites.
  7. Enable the replication of Data Guard between the primary and standby sites.
  8. Replicate the application database service to the standby database.
  9. Modify the database connection details on the standby site to use the standby database.

For a switchover, complete the following steps:

  1. Stop the replication of the persistent volume.
  2. Shut down OIG on the primary site.
  3. Switch over to the standby database.
  4. Start OIG on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction.
  6. Enable the replication of Data Guard in the reverse direction.

For failover, complete the following steps:

  1. Stop the replication/application of the persistent volume.
  2. Shut down OIG on the primary site, if available.
  3. Activate/failover the standby database on the standby site.
  4. Start OIG on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction if the site is available, or recreate the primary site using the above steps.
  6. Enable the replication of Data Guard in the reverse direction if the site is available, or recreate the database on the old primary site using the above steps.

Oracle Identity Role Intelligence

Disaster recovery for Oracle Identity Role Intelligence (OIRI) is achieved using the active/passive model.

Figure 5-4 OIRI Disaster Recovery


Disaster Recovery for OIRI

Considerations for the disaster recovery solution:

  • The OIRI application data is stored within an Oracle database, which is then replicated to the standby site using Oracle Data Guard.
  • The configuration information (domain definitions) is stored in the persistent volume replicated using the file system replication. This process involves using a replication tool such as 'rsync' on the persistent volumes. Because the configuration information changes rarely, it does not require frequent replication.
  • After the configuration information is replicated to the standby site, all database connections will be modified to point to the standby database at the standby site.
  • If any databases, such as the OIG database, have been switched over, then update the database connections to point to the standby versions.
  • OIRI is closely integrated with the Kubernetes framework, allowing it to directly interact with the cluster for data-ingestion tasks. After it is deployed, OIRI has information about the cluster in which it is running. If you run OIRI from a different Kubernetes cluster, you need to update not only the database connection information but also the Kubernetes cluster information.

For more information, see Configuring Disaster Recovery for Oracle Identity Role Intelligence.

The overall process of setting up disaster recovery for OIRI includes the following steps:

  1. Install OIRI on the primary site using the standard procedures.
  2. Install OIRI on the standby site using the standard procedures, or backup the Kubernetes objects from the primary Kubernetes cluster and restore it on the standby cluster.
  3. Modify the database service to be active only when the database is functioning in the roles of PRIMARY or SNAPSHOT STANDBY.
  4. Delete the configuration data from the standby site (if you created a new environment).
  5. Delete the database from the standby site.
  6. Enable the replication of the persistent volume between the primary and standby sites.
  7. Enable the replication of Data Guard between the primary and standby sites.
  8. Replicate the application database service to the standby database.
  9. Modify the database connection details on the standby site to use the standby database.
  10. Modify the Kubernetes cluster connection details on the standby site to use the standby Kubernetes cluster.

For a switchover, complete the following steps:

  1. Stop the replication of the persistent volume.
  2. Shut down OIRI on the primary site.
  3. Switch over to the standby database.
  4. Start OIRI on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction.
  6. Enable the replication of Data Guard in the reverse direction.

For failover, complete the following steps:

  1. Stop the replication/application of the persistent volume.
  2. Shut down OIRI on the primary site, if available.
  3. Activate the standby database on the standby site.
  4. Start OIRI on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction if the site is available, or recreate the primary site using the above steps.
  6. Enable the replication of Data Guard in the reverse direction if the site is available, or recreate the database on the old primary site using the above steps.

Oracle Advanced Authentication, Oracle Adaptive Risk Management, and Oracle Universal Authenticator

Disaster recovery for Oracle Advanced Authentication (OAA) Oracle Adaptive Risk Management (OARM) is achieved using the active/passive model.

Figure 5-5 OAA Disaster Recovery


Disaster Recovery for OAA

Considerations for the disaster recovery solution:

  • Every load balancer in the configuration (primary/standby) will contain the same SSL certificates.
  • The OAA application data is stored within an Oracle database, which is then replicated to the standby site using Oracle Data Guard.
  • The configuration information and vaults are replicated using the file system replication. This process involves using a replication tool such as 'rsync' on the persistent volumes. Because the configuration information changes rarely, it does not require frequent replication.
  • After the configuration information is replicated to the standby site, all database connections will be directed to the standby database at the standby site.
  • OAA stores configuration information in multiple Kubernetes objects. Therefore, it is simpler to redeploy the application in the standby site using the updated connection information rather than backup and restore the Kubernetes objects.
  • OAA is closely integrated with the Kubernetes framework, allowing it to directly interact with the cluster for data-ingestion tasks. After it is deployed, OAA has information about the cluster in which it is running. If you run OAA from a different site, you need to update not only the database connection information but also the Kubernetes cluster information. This should not be replicated between the sites.
  • OAA is closely coupled with Oracle Access Manager (OAM) for OAuth validation. Therefore, OAA should have access to OAM at all times. If you are using an OAM Multi-Datacenter configuration, the OAuth domain must exist in each OAM deployment. If you are using an active/passive OAM disaster recovery solution, the active OAA must have access to an active OAM site.

The overall process of setting up disaster recovery for OAA includes the following steps:

  1. Install OAA on the primary site using the standard procedures.
  2. Create the OAA management container on the standby system.
  3. Enable the replication of the persistent volume between the primary and standby sites (excluding the Kubernetes configuration).
  4. Enable the replication of Data Guard between the primary and standby sites.
  5. Replicate the application database service to the standby database.
  6. Modify the database service to be active only when the database is functioning in the roles of PRIMARY or SNAPSHOT STANDBY.
  7. Convert the Data Guard database to a SNAPSHOT STANDBY database.
  8. Update the database connection information in the installOAA.properties file and set database.createschema=false.
  9. Install OAA on the standby site by rerunning OAA.sh to redeploy the application. It is critical to replicate the file system from primary to standby before you run this command (including the logs persistent volume). This command will then create the Kubernetes objects on the standby site.
  10. Validate the configuration.
  11. Shut down the OAA deployment and convert the Data Guard database back to a physical standby.

For more information, see Configuring Disaster Recovery for Oracle Advanced Authentication.

For a switchover, complete the following steps:

  1. Stop the replication of the persistent volume.
  2. Shut down OAA on the primary site.
  3. Switch over to the standby database.
  4. Start OAA on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction.
  6. Enable the replication of Data Guard in the reverse direction.

For failover, complete the following steps:

  1. Stop the replication/application of the persistent volume.
  2. Shut down OAA on the primary site, if available.
  3. Activate the standby database on the standby site.
  4. Start OAA on the standby site.
  5. Enable the replication of the persistent volume in the reverse direction if the site is available, or recreate the primary site using the above steps.
  6. Enable the replication of Data Guard in the reverse direction if the site is available, or recreate the database on the old primary site using the above steps.

Prometheus, Grafana, Elastic Search and Grafana

These products are not Oracle products. Therefore, the approach for disaster recovery for them is beyond the scope of this document. For details on how to enable disaster recovery for these products, see the appropriate product documentation.

Roadmap for Setting Up Disaster Recovery

This section provides a high-level summary of the steps required to set up disaster recovery for the entire Oracle Identity and Access Management suite.

  1. Enable communication between the primary and standby sites.
  2. Set up the load balancers on each site to point to local installations.
  3. Ensure that the load balancers use the same SSL certificates.
  4. Set up Data Guard for the Oracle databases(s) to the standby site.
  5. Install the Oracle HTTP Server (OHS) on the standby site.
  6. Copy the OHS configuration from the primary to the standby site, changing routing to the standby site.
  7. Install Ingress controller on the standby site (if used).
  8. Install the Oracle WebLogic Operator on the standby site.
  9. Deploy OUD on the standby site using Kubernetes snapshots or a fresh install and data deletion.
  10. Enable the OUD persistent volume synchronization between the primary and standby sites.
  11. Convert the standby database to a snapshot standby.
  12. Deploy OAM on the standby site using Kubernetes snapshots or a fresh install and data deletion.
  13. Enable the OAM persistent volume synchronization between the primary and standby sites.
  14. Change the OAM database connection settings on the standby site, if needed.
  15. Deploy the WebGate on OHS using the WebGate artifacts from the primary site.
  16. Deploy OIG on the standby site using Kubernetes snapshots or a fresh install and data deletion.
  17. Enable the OIG persistent volume synchronization between the primary and standby sites.
  18. Change the OIG database connection settings on the standby site, if needed.
  19. Deploy OIG on the standby site using Kubernetes snapshots or a fresh install and data deletion.
  20. Enable the OIRI persistent volume synchronization between the primary and standby sites.
  21. Change the OIRI database connection settings on the standby site, if needed.
  22. Change the OIRI Kubernetes cluster connection settings on the standby site.
  23. If not already started, start OAM against the snapshot standby database.
  24. Start the OAA Management Container and configure it for the local cluster.
  25. Enable the OAA persistent volume synchronization between the primary and standby sites.
  26. Change the OAA database connection settings in installOAA.properties on the standby site, if needed.
  27. Redeploy OAA on the standby site using OAA.sh.
  28. Validate that the environment is working as required.
  29. Enable the replication of the regular file system for each deployed product.
  30. Convert the standby database back to a physical standby.