22 Configuring Disaster Recovery
This chapter includes the following topics:
- Generic Disaster Recovery Processes
Some of the common disaster recovery processes include using rsync to copy data between sites, using Oracle Data Guard to create a standby database, backing up and restoring the cluster artifacts, synchronizing the persistent volumes, and so on. - Configuring Disaster Recovery for Oracle Unified Directory
The disaster recovery for Oracle Unified Directory (OUD) is an active/passive solution. Currently, you cannot use thedsreplication
command to set up a remote replica. - Configuring Disaster Recovery for Oracle Access Manager
The disaster recovery for Oracle Access Manager (OAM) is an active/passive solution. - Configuring Disaster Recovery for Oracle Identity Governance
The disaster recovery for Oracle Identity Governance (OIG) is an active/passive solution. - Configuring Disaster Recovery for Oracle Identity Role Intelligence
The disaster recovery for Oracle Identity Role Intelligence (OIRI) is an active/passive solution. - Configuring Disaster Recovery for Oracle Advanced Authentication
The disaster recovery for Oracle Advanced Authentication (OAA) is an active/passive solution.
Parent topic: Configuring the Enterprise Deployment
Generic Disaster Recovery Processes
Some of the common disaster recovery processes include using rsync to copy data between sites, using Oracle Data Guard to create a standby database, backing up and restoring the cluster artifacts, synchronizing the persistent volumes, and so on.
Here is the list of the generic disaster recovery processes:
- Creating a Container with rsync
- Creating a Data Guard Database
- Backing Up and Restoring the Kubernetes Objects
- Creating a Kubernetes CronJob to Synchronize the Persistent Volumes
- Creating the Persistent Volume Claims
- Creating a DR ConfigMap
- Creating a Backup/Restore Job
Parent topic: Configuring Disaster Recovery
Creating a Container with rsync
- Hardware Replication - Uses disk based replication of cloud based file system replication.
- Software Replication - Uses a software tool to manually replicate a file system. One of the most widely available and efficient tools to perform software replication is rsync.
This section describes how to create a light weight container with the
rsync
command which can be run inside the Kubernetes cluster.
This container can then be invoked using a Kubernetes CronJob.
You can run rsync as a CronJob on the underlying worker nodes, but this creates an external dependency. Another way of achieving this is by creating a pod that runs inside of the cluster which performs the rsync operations.
This job ensures that the application disaster recovery processes run as
part of the Kubernetes cluster. CronJobs require container images that include the
rsync
command. The two main Linux container
images, Alpine and Busybox, do not contain the rsync
command by
default. However, they can be extended to include the command. Complete the
following steps to create an image based on the Alpine container with rsync
included.
The following steps provide instructions on how to create a container image based on
Alpine that also includes the rsync
utility. These steps require
a runtime environment with Podman or Docker and a logged-in DockerHub account to
access the repository images.
Parent topic: Generic Disaster Recovery Processes
Creating a Data Guard Database
You can use rsync to copy the data from one site to the other outside of Kubernetes. However, a more efficient way of copying data is by creating a Kubernetes CronJob. In the event of a disaster recovery, after you have deployed your application on the standby site, you should delete the existing database and create a standby database using Oracle Data Guard.
If you are using a non-Oracle Cloud based deployment (or you want to configure it manually), see Learn About Configuring a Standby Database for Disaster Recovery for instructions.
Note:
After creating the Data Guard database, ensure that it has the same initialization parameter values as that of the primary database.If you are using Oracle Cloud Infrastructure, use the following steps:
Note:
A cross-region Data Guard association cannot be created between databases of DB systems that belong to VCNs that are not peers.
Then, raise a Service Request with Oracle Support requesting that
CrossRegionDataguardNetworkValidation
be disabled.
Parent topic: Generic Disaster Recovery Processes
Backing Up and Restoring the Kubernetes Objects
Every Kubernetes cluster is distinct and requires individual maintenance. The artifacts within the cluster determine the application's operation. The artifacts include namespaces, persistent volumes, config maps, and more. These artifacts must be created on the standby cluster before you start the application.
There are two ways of creating the artifacts on the standby system:
- Perform a separate installation using the same configuration information as that of the primary site but using a throwaway database. After you complete the installation, discard the database and domain configuration and replace it with the primary site's database and domain configuration.
- Back up the Kubernetes objects in the primary site and restore them to the standby site, pointing the persistent volumes to the standby NFS servers. This approach is not recommended for Oracle Advanced Authentication.
Oracle provides a tool to simplify the process of backing up and restoring the Kubernetes objects. For information about the tool, see Using the Oracle K8s Disaster Protection Scripts.
To use these scripts:
Note:
- If you do not precreate the persistent volumes on the standby site before restoring, the applications will not start. Alternatively, you can create the PVs after restoring, but this will cause pods to remain pending until the volumes are created. It is crucial to ensure that the persistent volumes on the standby site point to the NFS server in that site.
- It is possible to request the backup to include the persistent volumes in its backup process. However, this will also backup the mounts to the primary NFS server. Ensure that you do not allow the application to start on the standby site by using the primary NFS server because it could result in the corruption of your primary deployment.
- If the application is running on the primary when the backup is being taken, the application will be started on the standby when the restore is being performed. To minimize errors, you should open the Data Guard database as a snapshot standby. However, some errors may still occur due to the running jobs from SOA and OIG, but these can be ignored because they are caused by the inactive status of the site.
At some point, you should switch your Data Guard database to the standby site and verify that the standby services are functional.
Parent topic: Generic Disaster Recovery Processes
Creating a Kubernetes CronJob to Synchronize the Persistent Volumes
For each solution described in this section, the configuration information stored on the persistent volumes must be synchronized between the primary and standby sites, with the ability to reverse the synchronization in case of a site switchover.
As mentioned earlier, to achieve the maximum efficiency, the best approach
is to employ file system replication. See Backing Up and Restoring the Kubernetes Objects Alternatively, a cron
job be created inside the
Kubernetes cluster to carry out this task. The process is the same for each product,
with only the underlying volumes being different. It is advisable to create a separate
cron
job for each product rather than having one that handles all
products.
The following steps describe how to achieve this. The example uses Oracle Access Manager (OAM) for simplicity, but it applies to any product.
Parent topic: Generic Disaster Recovery Processes
Creating a Namespace for the Backup Jobs
Create a separate namespace outside of the product namespace to hold the backup jobs, either per product or for all the backup jobs.
Create the namespace using the command:
create namespace <namespace>
create namespace drns
Creating the Persistent Volumes
Each site requires a persistent volume that points to the NFS filer in the remote location.
Create the following persistent volumes on Site 1 using the provided
yaml
files:
site1_primary_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: primary-oam-pv
labels:
type: primary-oam-pv
spec:
storageClassName: manual
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
nfs:
path: /export/IAMPVS/oampv
server: <site1_nfs_server>
site1_standby_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: standby-oam-pv
labels:
type: standby-oam-pv
spec:
storageClassName: manual
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
nfs:
path: /export/IAMPVS/oampv
server: <site2_nfs_server>
Create the following persistent volumes on Site 2 using the provided
yaml
files:
site2_primary_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: primary-oam-pv
labels:
type: primary-oam-pv
spec:
storageClassName: manual
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
nfs:
path: /export/IAMPVS/oampv
server: <site2_nfs_server>
site2_standby_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: standby-oam-pv
labels:
type: standby-oam-pv
spec:
storageClassName: manual
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
nfs:
path: /export/IAMPVS/oampv
server: <site1_nfs_server>
Create the persistent volumes using the commands:
kubectl create -f site1_primary_pv.yaml
kubectl create -f site1_standby_pv.yaml
kubectl create -f site2_primary_pv.yaml
kubectl create -f site2_standby_pv.yaml
Creating the Persistent Volume Claims
After creating the persistent volumes, you can create the persistent volume claims in the DR namespace.
Create the following files (note that the files will be identical on each site):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: primary-oampv-pvc
namespace: <DRNS>
labels:
type: primary-oampv-pvc
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
selector:
matchLabels:
type: primary-oam-pv
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: standby-oampv-pvc
namespace: <DRNS>
labels:
type: standby-oampv-pvc
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
selector:
matchLabels:
type: standby-oam-pv
kubectl create -f primary_pvc.yaml
kubectl create -f standby_pvc.yaml
Parent topic: Generic Disaster Recovery Processes
Creating a DR ConfigMap
Each site has a ConfigMap object containing the configuration information about the site, such as the site's role, domain name, primary database scan address and service name, and so on.
- The role the site is performing. This role determines which way the PV is replicated. The replication should always be from primary to standby.
- The name of the domain (determines the location of the DOMAIN_HOME on the persistent volume).
- The Primary Database Scan address and service name. These are used to change the replicated configuration information to point to the database in the standby site.
- The standby database scan address and service name. These are used to change the replicated configuration information to point to the database in the standby site.
apiVersion: v1
kind: ConfigMap
metadata:
name: dr-cm
namespace: <DRNS>
data:
ENV_TYPE: <ENV_TYPE>
DR_TYPE: <DR_TYPE>
OAM_DOMAIN_NAME: <OAM_DOMAIN_NAME>
OAM_LOCAL_SCAN: <OAM_LOCAL_SCAN>
OAM_REMOTE_SCAN: <OAM_REMOTE_SCAN>
OAM_LOCAL_SERVICE: <OAM_LOCAL_SERVICE>
OAM_REMOTE_SERVICE: <OAM_REMOTE_SERVICE>
OIG_DOMAIN_NAME: <OIG_DOMAIN_NAME>
OIG_LOCAL_SCAN: <OIG_LOCAL_SCAN>
OIG_REMOTE_SCAN: <OIG_REMOTE_SCAN>
OIG_LOCAL_SERVICE: <OIG_LOCAL_SERVICE>
OIG_REMOTE_SERVICE: <OIG_REMOTE_SERVICE>
OIRI_LOCAL_SCAN: <OIRI_LOCAL_SCAN>
OIRI_REMOTE_SCAN: <OIRI_REMOTE_SCAN>
OIRI_LOCAL_SERVICE: <OIRI_LOCAL_SERVICE>
OIRI_REMOTE_SERVICE: <OIRI_REMOTE_SERVICE>
OIRI_REMOTE_K8: <OIRI_REMOTE_K8>
OIRI_REMOTE_K8CONFIG: <OIRI_REMOTE_K8CONFIG>
OIRI_REMOTE_K8CA: <OIRI_REMOTE_K8CA>
OIRI_LOCAL_K8: <OIRI_LOCAL_K8>
OIRI_LOCAL_K8CONFIG: <OIRI_LOCAL_K8CONFIG>
OIRI_LOCAL_K8CA: <OIRI_LOCAL_K8CA>
OAA_LOCAL_SCAN: <OAA_LOCAL_SCAN>
OAA_REMOTE_SCAN: <OAA_REMOTE_SCAN>
OAA_LOCAL_SERVICE: <OAA_LOCAL_SERVICE>
OAA_REMOTE_SERVICE: <OAA_REMOTE_SERVICE>
Here is a sample of the ConfigMap object:
site1_dr_cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dr-cm
namespace: <DRNS>
data:
ENV_TYPE: OTHER
DR_TYPE: PRIMARY
OAM_LOCAL_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OAM_REMOTE_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OAM_LOCAL_SERVICE: oamsvc.dbsubnet.oke.oraclevcn.com
OAM_REMOTE_SERVICE: oamsvc.dbsubnet.oke.oraclevcn.com
OIG_DOMAIN_NAME: governancedomain
OIG_LOCAL_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OIG_REMOTE_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OIG_LOCAL_SERVICE: oigsvc.dbsubnet.oke.oraclevcn.com
OIG_REMOTE_SERVICE: oigsvc.dbsubnet.oke.oraclevcn.com
OIRI_LOCAL_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OIRI_LOCAL_SERVICE: oirisvc.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_SERVICE: oirisvc.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_K8: 10.1.0.10:6443
OIRI_REMOTE_K8CONFIG: standby_k8config
OIRI_REMOTE_K8CA: standby_ca.crt
OIRI_LOCAL_K8: 10.0.0.5:6443
OIRI_LOCAL_K8CONFIG: primary_k8config
OIRI_LOCAL_K8CA:primary_ca.crt
OAA_LOCAL_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OAA_REMOTE_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OAA_LOCAL_SERVICE: oaasvc.dbsubnet.oke.oraclevcn.com
OAA_REMOTE_SERVICE: oaasvc.dbsubnet.oke.oraclevcn.com
site2_dr_cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dr-cm
namespace: <DRNS>
data:
ENV_TYPE: OTHER
DR_TYPE: STANDBY
OAM_LOCAL_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OAM_REMOTE_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OAM_LOCAL_SERVICE: oamsvc.dbsubnet.oke.oraclevcn.com
OAM_REMOTE_SERVICE: oamsvc.dbsubnet.oke.oraclevcn.com
OIG_DOMAIN_NAME: governancedomain
OIG_LOCAL_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OIG_REMOTE_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OIG_LOCAL_SERVICE: oigsvc.dbsubnet.oke.oraclevcn.com
OIG_REMOTE_SERVICE: oigsvc.dbsubnet.oke.oraclevcn.com
OIRI_LOCAL_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OIRI_LOCAL_SERVICE: oirisvc.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_SERVICE: oirisvc.dbsubnet.oke.oraclevcn.com
OIRI_REMOTE_K8: 10.0.0.10:6443
OIRI_REMOTE_K8CONFIG: standby_k8config
OIRI_REMOTE_K8CA: standby_ca.crt
OIRI_LOCAL_K8: 10.1.0.5:6443
OIRI_LOCAL_K8CONFIG: primary_k8config
OIRI_LOCAL_K8CA:primary_ca.crt
OAA_LOCAL_SCAN: site2-scan.dbsubnet.oke.oraclevcn.com
OAA_REMOTE_SCAN: site1-scan.dbsubnet.oke.oraclevcn.com
OAA_LOCAL_SERVICE: oaasvc.dbsubnet.oke.oraclevcn.com
OAA_REMOTE_SERVICE: oaasvc.dbsubnet.oke.oraclevcn.com
kubectl create -f site1_dr_cm.yaml
kubectl create -f site2_dr_cm.yaml
Parent topic: Generic Disaster Recovery Processes
Creating a Backup/Restore Job
The steps below describe how to create a job which will run periodically to back up
the primary persistent volume using the rsync
command and restore it
to the standby system.
- Creating a Backup/Restore Script
- Creating a CronJob to Run the Backup/Restore Script Periodically
- Suspending/Resuming the CronJob
- Creating an Initialization Job
Parent topic: Generic Disaster Recovery Processes
Creating a Backup/Restore Script
You should create a script to backup the persistent volume and copy the backup to the
standby site. The deployment automation scripts contain a sample script to perform
this task. This script is called <product>_dr.sh
and it
is located in the SCRIPT_DIR/templates/<product>
directory.
SCRIPT_DIR/templates/oam/oam_dr.sh
. You
need to copy this script to the persistent volume using one of the following
methods:
- By copying the script to a running container in the deployment.
- By creating a simple Alpine container to access the persistent volume.
- By temporarily mounting the NFS to a host where the scripts are being developed.
For the purpose of this document, the location where the script needs to be copied is
referred to as 'dr_scripts
'.
The script will run on both sites. If the site is in primary mode, it will create a backup of the persistent volume and send it to Site 2. If the site is in standby mode, it will restore the backup received from Site 1 onto Site 2. It is recommended that the scripts make a local copy of the persistent volume before performing any backup or restore operations.
It is highly recommended that the scripts make a local copy of the persistent volume
beforehand by using NFS utilities such as snapshots or by using the
rsync
command, as per the following steps:
Parent topic: Creating a Backup/Restore Job
Creating a CronJob to Run the Backup/Restore Script Periodically
To automate the backup process, you need to create a job that runs the backup script periodically. Another job will be created based on this job to initialize the backup process. To avoid any unexpected runs of the CronJob, it should be suspended immediately after its creation.
The frequency of the job's schedule should be based on how often configuration changes are made. If changes are less frequent, the job can be run less frequently. However, the more often the job runs, the more resources it will require, which can potentially impact system performance. Additionally, the interval between jobs should provide enough time for the job to complete before the next iteration starts. A suitable frequency might be once per day with additional manual synchronizations as needed.
The initial run of the job will take longer compared to the subsequent runs. Oracle recommends you to suspend the CronJob as soon as you create it, and then manually run a job to initialize the DR site. After you initialize the DR site, you can restart the CronJob.
To create a CronJob in the drns
namespace, create a file called
oamdr-cron.yaml
with the following contents:
apiVersion: batch/v1
kind: CronJob
metadata:
name: <product>rsyncdr
namespace: <DRNS>
spec:
schedule: "*/720 * * * *"
jobTemplate:
spec:
template:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: alpine-rsync
image: iad.ocir.io/mytenancy/idm/alpine-rsync:latest
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: dr-cm
volumeMounts:
- mountPath: "/u01/primary_oampv"
name: oampv
- mountPath: "/u01/dr_oampv"
name: oampv-dr
command:
- /bin/sh
- -c
- /u01/primary_oampv/dr_scripts/oam_dr.sh
volumes:
- name: oampv
persistentVolumeClaim:
claimName: primary-oampv-pvc
- name: oampv-dr
persistentVolumeClaim:
claimName: standby-oampv-pvc
restartPolicy: OnFailure
apiVersion: batch/v1
kind: CronJob
metadata:
name: oamrsyncdr
namespace: drns
spec:
schedule: "*/720 * * * *"
jobTemplate:
spec:
template:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: alpine-rsync
image: iad.ocir.io/paasmaa/idm/alpine-rsync:latest
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: dr-cm
volumeMounts:
- mountPath: "/u01/primary_oampv"
name: oampv
- mountPath: "/u01/dr_oampv"
name: oampv-dr
command:
- /bin/sh
- -c
- /u01/primary_oampv/dr_scripts/oam_dr.sh
volumes:
- name: oampv
persistentVolumeClaim:
claimName: primary-oampv-pvc
- name: oampv-dr
persistentVolumeClaim:
claimName: standby-oampv-pvc
restartPolicy: OnFailure
This job runs once a day. For a full description of the CronJob scheduling in Kuberenetes, see FreeBSD File Formats Manual.
This job uses the custom Alpine container image that was created earlier. See Creating a Container with rsync. To allow for the initialization of the DR site, which takes longer than regular refreshes, you need to immediately suspend the job. See Suspending/Resuming the CronJob. It is essential to create a separate job to perform the initialization process as this ensures that only one sync job runs at a time because it takes time to perform the initial copy.
Parent topic: Creating a Backup/Restore Job
Suspending/Resuming the CronJob
kubectl patch cronjobs <product>rsyncdr -p '{"spec" : {"suspend" : true }}' -n <NAMESPACE>
kubectl patch cronjobs oamrsyncdr -p '{"spec" : {"suspend" : true }}' -n drns
To restart the backup CronJob, run the following command:
kubectl patch cronjobs <product>rsyncdr -p '{"spec" : {"suspend" : false }}' -n <NAMESPACE>
kubectl patch cronjobs oamrsyncdr -p '{"spec" : {"suspend" : false }}' -n drns
Parent topic: Creating a Backup/Restore Job
Creating an Initialization Job
kubectl create job --from=cronjob.batch/<product>rsyncdr <product>-initialise-dr -n <DRNS>
For example:
kubectl create job --from=cronjob.batch/oamrsyncdr oam-initialise-dr -n drns
Monitor the job until it completes successfully. After the job completes, you can initiate a restore on the second site by using the same commands.
Parent topic: Creating a Backup/Restore Job
Configuring Disaster Recovery for Oracle Unified Directory
The disaster recovery for Oracle Unified Directory (OUD) is an
active/passive solution. Currently, you cannot use the dsreplication
command to set up a remote replica.
The process of setting up the DR site is summarized below:
- Perform a minimal installation on Site 2 using the standard deployment procedures.
- Stop the OUD processes on the DR site.
- Delete the OUD persistent volume data on the DR site.
- Do one of the following:
- Enable disk replication for the persistent volume.
- Enable manual replication using a CronJob which mounts both the
local and the remote OUD persistent volume. This job will use the
rsync
command to synchronize the data. Suspend the job after the replication is complete.
- Perform an initial data sync.
- Verify that the DR site comes up successfully.
- Shut down the DR site.
- Restart the CronJob to copy the data periodically.
The following sections describe the procedure in detail:
- Prerequisites
- Creating an Empty OUD Deployment on Site 2
- Enabling a Manual Replication
- Shutting Down the OUD Installation on Site 2
- Deleting the Persistent Volume Data on Site 2
- Creating an Initialization Job
- Verifying OUD on the DR Site
- Starting the Automatic Syncing Process
Parent topic: Configuring Disaster Recovery
Prerequisites
- You are able to mount the remote file systems in the local pods. This task may require firewall or security list permissions.
- You are able to perform disk replication between the sites. This task may require firewall or security list permissions.
- You have access to Alpine or Busybox with
rsync
installed, if you plan to use manual replication. See Creating a Backup/Restore Job.
Creating an Empty OUD Deployment on Site 2
Create an OUD deployment on Site 2 by following the instructions in Installing and Configuring Oracle Unified Directory with the following exceptions:
- Use the same schema extension file as Site 1.
- There is no need for a seeding file.
- Use a minimal server overrides file. For
example:
image: repository: <OUD_REPOSITORY> tag: <OUD_VER> pullPolicy: IfNotPresent imagePullSecrets: - name: regcred oudConfig: baseDN: <LDAP_SEARCHBASE> rootUserDN: <LDAP_ADMIN_USER> rootUserPassword: <LDAP_ADMIN_PWD> sleepBeforeConfig: 300 persistence: type: networkstorage networkstorage: nfs: server: <PVSERVER> path: <OUD_SHARE> configVolume: enabled: true type: networkstorage networkstorage: nfs: server: <PVSERVER> path: <OUD_CONFIG_SHARE> mountPath: /u01/oracle/config-input replicaCount: <OUD_REPLICAS> ingress: enabled: false type: nginx tlsEnabled: false elk: enabled: false imagePullSecrets: - name: dockercred cronJob: kubectlImage: repository: bitnami/kubectl tag: <KUBERNETES_VER> pullPolicy: IfNotPresent imagePullSecrets: - name: dockercred baseOUD: envVars: - name: schemaConfigFile_1 value: /u01/oracle/config-input/99-user.ldif - name: restartAfterSchemaConfig value: "true" replOUD: envVars: - name: dsconfig_1 value: set-global-configuration-prop --set lookthrough-limit:75000
Enabling a Manual Replication
Complete the following procedures if you want to create a CronJob to manually replicate data between the sites.
Creating a Persistent Volume for the Remote Persistent Volume
To create a persistent volume for the remote OUD persistent volume:
You should create this persistent volume on both Site 1 and Site 2. Ensure that on
Site 1 the REMOTE_PV_SERVER
points to the PVSERVER
on Site 2 and on Site 2 the REMOTE_PV_SERVER
points to the
PVSERVER
on Site 1. Creating the PV on both the sites ensures
that the configuration is immediately available if the roles of Site 1 and Site 2
need to be reversed.
Parent topic: Enabling a Manual Replication
Creating a Persistent Volume Claim for the Remote Persistent Volume
To create a persistent volume claim for the remote OUD persistent volume:
You can create the same PVC claim on both Site 1 and Site 2. Creating on both sites ensures that the configuration is available immediately if the roles of Site 1 and Site 2 need to be reversed.
Parent topic: Enabling a Manual Replication
Creating a Backup and Restore Script
You must create a script that will backup the OUD instances and copy it
to your standby site. This script should then be able to restore the backup on the
standby site. If you need an example of such a script, you can find it in the
deployment automation scripts under the name oud_dr.sh
. The
script is located in the SCRIPT_DIR/templates/oud
directory.
See Creating a Backup/Restore Script.
This script must have the following characteristics:
- It should be able to determine whether it is running on the primary or the standby site.
- It should not run if a previous backup/restore job is in progress.
- When running on the primary site:
- Create a localized backup of the persistent volume (to a temporary location) by using snapshots (if available) or rsync.
- Copy the created backup to a temporary location on the standby site using rsync.
- When running on the standby site, restore the OUD instances in the received
backup to the persistent volume.
Note:
Restore only the instances, not the backup scripts.
Parent topic: Enabling a Manual Replication
Copying the Backup Script to the Persistent Volume
The easiest way to ensure that the container image is able to run the backup script is to copy the script to the persistent volume. You should perform this step on both the sites using the following commands:
kubectl exec -n <OUDNS> -ti <OUD_POD_PREFIX>-oud-ds-rs-0 -- mkdir /u01/oracle/user_projects/dr_scripts
kubectl cp <WORKDIR>/oud_dr.sh <OUDNS>/<OUD_POD_PREFIX>-oud-ds-rs-0:/u01/oracle/user_projects/dr_scripts
kubectl exec -n <OUDNS> -ti <OUD_POD_PREFIX>-oud-ds-rs-0 -- chmod 750 /u01/oracle/user_projects/dr_scripts/oud_dr.sh
For example:
kubectl exec -n oudns -ti edg-oud-ds-rs-0 -- mkdir /u01/oracle/user_projects/dr_scripts
kubectl cp /workdir/OUD/oud_dr.sh oudns edg-oud-ds-rs-0:/u01/oracle/user_projects/dr_scripts
kubectl exec -n oudns -ti edg-oud-ds-rs-0 -- chmod 750 /u01/oracle/user_projects/dr_scripts/oud_dr.sh
Parent topic: Enabling a Manual Replication
Creating a CronJob
Create a CronJob to synchronize the local and remote persistent volumes. For instructions, see Creating a CronJob to Run the Backup/Restore Script Periodically. After creating, suspend the CronJob immediately. For instructions, see Suspending/Resuming the CronJob.
Parent topic: Enabling a Manual Replication
Shutting Down the OUD Installation on Site 2
Shut down the OUD installation on Site 2 if everything is functioning as expected. Use the following command to shut down:
helm upgrade -n oudns --set replicaCount=0 edg <WORKDIR>/samples/kubernetes/helm/oud-ds-rs --reuse-values
Deleting the Persistent Volume Data on Site 2
To ensure that the data from Site 1 is fully copied across to Site 2, it is important to remove the data that was created as part of the installation. Assuming that the persistent volume is mounted locally, you can remove the data using the following command:
rm -rf /nfs_volumes/oudpv/*oud-ds-rs*
Creating an Initialization Job
Use the CronJob you defined earlier (see Creating a CronJob for creating a one off job to perform the initial data transfer using the command:
kubectl create job --from=cronjob.batch/rsyncdr initialise-dr -n <DRNS>
kubectl create job --from=cronjob.batch/rsyncdr initialise-dr -n drns
Note:
Perform this step on Site 1.Verifying OUD on the DR Site
After the successful initial data load, you can start the OUD servers in Site 2 using the command:
helm upgrade -n <OUDNS> --set replicaCount=<REPLICA_COUNT> <OUD_POD_PREFIX> <WORKDIR>/samples/kubernetes/helm/oud-ds-rs --reuse-values
helm upgrade -n oudns --set replicaCount=1 edg /workdir/OUD/samples/kubernetes/helm/oud-ds-rs --reuse-values
After starting the servers, verify the availability of data from the primary site by
checking the replication status and running the ldapsearch
command-line tool.
If everything is functioning as expected, shut down OUD by using the following command:
helm upgrade -n oudns --set replicaCount=0 edg <WORKDIR>/samples/kubernetes/helm/oud-ds-rs --reuse-values
Starting the Automatic Syncing Process
Start the CronJob on Site 1 to periodically synchronize the changes from Site 1 Persistent Volume to Site 2. See Suspending/Resuming the CronJob.
Configuring Disaster Recovery for Oracle Access Manager
The disaster recovery for Oracle Access Manager (OAM) is an active/passive solution.
The following sections describe the procedure in detail:
- Prerequisites
- Creating the Disaster Recovery Site From the Primary Site
- Creating the Disaster Recovery Site on the Standby Site
Parent topic: Configuring Disaster Recovery
Prerequisites
Before enabling disaster recovery for Oracle Access Manager, ensure the following:
- The primary and standby sites for Data Guard communicate with each other.
- The primary and standby sites for file system replication communicate with each other.
- The Kubernetes clusters in both the primary and standby sites are able to resolve the names of the NFS servers in all sites.
- Each Kubernetes cluster is independent.
- A running Kubernetes cluster is present in both the sites.
Parent topic: Configuring Disaster Recovery for Oracle Access Manager
Creating the Disaster Recovery Site From the Primary Site
Parent topic: Configuring Disaster Recovery for Oracle Access Manager
Creating the Disaster Recovery Site on the Standby Site
There are two options for creating the disaster recovery site on the standby site:
By Creating a New Disposable Environment on Site 2
- Create a disposable deployment in Site 2 using the same values as the primary site. If you have used the EDG automation scripts, you just need to change the values of the worker nodes and the PV server in the response file.
- Shut down the Kubernetes pods that are running on Site 2.
- Delete the contents of the persistent volume.
- Restore the backup of the persistent volume from the backup taken on Site 1. See Creating a Kubernetes CronJob to Synchronize the Persistent Volumes.
- Amend the database connection strings in the restored backup to point to the database on Site 2.
- Copy the WebGate objects from the restored backup to the Oracle HTTP Servers on Site 2.
- Validate that the disaster recovery site is working.
- Delete the throwaway database used to perform the initial installation.
By Performing a Kubernetes Restore on Site 2
- Create the persistent volumes for the OAM domain such that they point at the NFS server in the standby site. See Creating the Kubernetes Persistent Volume.
- Open the standby database as a snapshot standby by using the following Data
Guard broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to snapshot standby
- Restore a backup of the persistent volume taken on Site 1. See Creating a Kubernetes CronJob to Synchronize the Persistent Volumes.
- Restore the backup of the Kubernetes objects from the backup taken on Site 1. See Backing Up and Restoring the Kubernetes Objects.
- Amend the database connection strings in the restored backup to point to the database on Site 2.
- Copy the WebGate objects from the restored backup to the Oracle HTTP Servers on Site 2.
- Validate that the disaster recovery site is working.
- Reinstate the standby database by issuing the following Data Guard broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to physical standby
Switch over the database and validate that the deployment is fully working.
Parent topic: Configuring Disaster Recovery for Oracle Access Manager
Configuring Disaster Recovery for Oracle Identity Governance
The disaster recovery for Oracle Identity Governance (OIG) is an active/passive solution.
The following sections describe the procedure in detail:
- Prerequisites
- Creating the Disaster Recovery Site From the Primary Site
- Creating the Disaster Recovery Site on the Standby Site
- Disabling the Job Scheduler for Configuring Oracle Identity Manager
Parent topic: Configuring Disaster Recovery
Prerequisites
Before enabling disaster recovery for Oracle Identity Governance, ensure the following:
- The primary and standby sites for file system replication communicate with each other.
- The database system in the primary site communicate with each other for database replication.
- The Kubernetes clusters in both the primary and standby sites are able to resolve the names of the NFS servers in all sites.
- Each Kubernetes cluster is independent.
- A running Kubernetes cluster is present in both the sites.
Creating the Disaster Recovery Site on the Standby Site
There are two options for creating the disaster recovery site on the standby site:
By Creating a New Disposable Environment on Site 2
- Create a disposable deployment in Site 2 using the same values as the primary site. If you have used the EDG automation scripts, you just need to change the values of the worker nodes and the PV server in the response file.
- Shut down the Kubernetes pods that are running on Site 2.
- Delete the contents of the persistent volume.
- Restore the backup of the persistent volume from the backup taken on Site 1. See Creating a Kubernetes CronJob to Synchronize the Persistent Volumes.
- Amend the database connection strings in the restored backup to point to the database on Site 2.
- Validate that the disaster recovery site is working.
- Delete the throwaway database used to perform the initial installation.
By Performing a Kubernetes Restore on Site 2
- Create the persistent volumes for the OIG domain such that they point at the NFS server in the standby site. See Creating the Kubernetes Persistent Volume.
- Open the standby database as a snapshot standby by using the following Data
Guard broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to snapshot standby
- Restore a backup of the persistent volume taken on Site 1. See Creating a Kubernetes CronJob to Synchronize the Persistent Volumes.
- Restore the backup of the Kubernetes objects from the backup taken on Site 1. See Backing Up and Restoring the Kubernetes Objects.
- Amend the database connection strings in the restored backup to point to the database in Site 2.
- Validate that the disaster recovery site is working.
- Reinstate the standby database by issuing the following Data Guard broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to physical standby
Switch over the database and validate that the deployment is fully working.
Disabling the Job Scheduler for Configuring Oracle Identity Manager
You must the disable the Oracle Identity Governance job scheduler on Site 2 for configuring Oracle Identity Governance.
The OIG job scheduler uses the database intensively. To keep inter-site traffic to a minimum, the job scheduler should be disabled on the site where the database is not primary. This step is not required if you plan to shut down the OIG Managed Servers on Site 2. You can disable the job scheduler by adding the following parameter to the server startup arguments:
-Dscheduler.disabled=true
After switching over the database, this parameter should be removed from these servers and added into the server definitions of the now standby site.
Configuring Disaster Recovery for Oracle Identity Role Intelligence
The disaster recovery for Oracle Identity Role Intelligence (OIRI) is an active/passive solution.
The following sections describe the procedure in detail:
Prerequisites
Before enabling disaster recovery for Oracle Identity Role Intelligence, ensure the following:
- The primary and standby sites for file system replication communicate with each other.
- The database system in the primary site communicate with each other for database replication.
- The Kubernetes clusters in both the primary and standby sites are able to resolve the names of the NFS servers in all sites.
- Each Kubernetes cluster is independent.
- A running Kubernetes cluster is present in both the sites.
Creating the Disaster Recovery Site
OIRI is tightly integrated into the Kubernetes framework. It interacts directly with the cluster to be able to run data-ingestion tasks. When you deploy OIRI, it contains information such as the cluster name in which it is running. When you run OIRI from a different site, then, in addition to updating the database connection information, you also need to update the Kubernetes cluster information. You can perform this task in two ways.
- When you are starting OIRI on a different site, update the Kubernetes configuration.
- Store the configuration for each site on the persistent volume. Then,
when you restore the persistent volume on the standby site, replace the Kubernetes
configuration with the appropriate values using these files.
It is possible to create these files only after you have created the OIRI Kubernetes objects on the standby site.
The recommended approach is to create copies of the Kubernetes configuration objects in the persistent volume and label them according to the site. For example,
primary_ca.crt
andprimary_k8config
. Create the standby site (see Creating the Disaster Recovery Site on the Standby Site), and then create a newca.crt
andkubeconfig
file based on the standby cluster. Copy these files to the persistent volume of the primary site calling the filesstandby_ca.crt
andstandby_k8config
. This naming convention will ensure that when you replicate the persistent volume to the standby site, you have copies of theca.crt
andkubeconfig
file for both the primary and standby clusters. After running the PV replication task, restore theca.crt
andkubeconfig
files on the persistent volume with those relevant to the site you want to run. For example:- When you use the primary site, the
ca.crt
andconfig
files will use the contents of theprimary_ca.crt
andprimary_k8config
files, respectively. - When you are use the standby site, the
ca.crt
andconfig
files will use the contents of thestandby_ca.crt
andstandby_k8config
files, respectively.
- When you use the primary site, the
Creating the Disaster Recovery Site From the Primary Site
On the primary site:
Parent topic: Creating the Disaster Recovery Site
Creating the Disaster Recovery Site on the Standby Site
There are two options for creating the disaster recovery site on the standby site:
By Creating a New Disposable Environment on Site 2
By Performing a Kubernetes Restore on Site 2
- Create the persistent volumes for the OIRI persistent volumes such that they point at the NFS server in the standby site. See Creating the Kubernetes Persistent Volume.
- Open the standby database as a snapshot standby by using the following Data
Guard broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to snapshot standby
- Restore a backup of the persistent volume taken on Site 1. See Creating a Kubernetes CronJob to Synchronize the Persistent Volumes.
- Restore the backup of the Kubernetes objects from the backup taken on Site 1. See Backing Up and Restoring the Kubernetes Objects.
- Start the OIRI-CLI container on the standby system. See Starting the Administration CLI.
- Create a
kubeconfig
file and a certificate on the standby site. See Generating the ca.crt Certificate and Creating a Kubernetes Configuration File for OIRI.Store the resulting files
ca.crt
andoiri_config
in the OIRI-CLI locations in both the primary and standby sites./app/k8s/standby_ca.crt /app/k8s/standby_ca.crt /app/k8s/standby_config.crt
- Amend the database connection strings in the restored backup to
point to the database in Site 2.
Update the following files:
/app/oiri/data/conf/application.yaml /app/data/conf/custom-attributes.yaml /app/data/conf/data-ingestion-config.yaml /app/data/conf/dbconfig.yaml /app/data/conf/env.properties
- Amend the Kubernetes cluster URL in the restored backup to point to the
Kubernetes cluster in the standby site. To find the Kubernetes cluster URL, use
the following
command:
grep server: $KUBECONFIG | sed 's/server://;s/ //g'
Update the following files:
/app/data/conf/data-ingestion-config.yaml /app/data/conf/env.properties
- Switch the Kubernetes configuration files to the copies appropriate
to the standby site. For
example:
cp /app/k8s/standby_ca.crt /app/k8s/ca.crt
cp /app/k8s/standby_ca.crt /app/ca.crt
cp /app/k8s/standby_config.crt /app/k8s/config
- Validate that the disaster recovery site is working.
- Reinstate the standby database by using the following Data Guard
broker
commands:
dgmgrl sys/Password
show configuration
convert database standby_db_name to physical standby
Switch over the database and validate that the deployment is fully working.
- Recopy the persistent volumes from the primary to the standby site on a periodic basis. For example, once a week, or whenever you make configuration changes to the OIRI deployment.
Parent topic: Creating the Disaster Recovery Site
Configuring Disaster Recovery for Oracle Advanced Authentication
The disaster recovery for Oracle Advanced Authentication (OAA) is an active/passive solution.
The following sections describe the procedure in detail:
Prerequisites
Before enabling disaster recovery for Oracle Advanced Authentication, ensure the following:
- The primary and standby sites for file system replication communicate with each other.
- The database system in the primary site communicate with each other for database replication.
- The Kubernetes clusters in both the primary and standby sites are able to resolve the names of the NFS servers in all sites.
- Each Kubernetes cluster is independent.
- A running Kubernetes cluster is present in both the sites.
- The OAuth provider must be available in both the sites.
Note:
If you are using OAM as the OAuth provider and the failover scenario is an all or nothing, that is, you will fail/switch over OAM and OAA together, then, for the purposes of creating the DR site, you need to enable OAM in the standby site during configuration by temporarily converting the OAM standby database to a snapshot standby and starting OAM using that database. After configuring OAA, shut down OAM and revert the database to a physical standby.
Each OAM must use the same SSL certificate.
Creating the Disaster Recovery Site
OAA is tightly integrated into the Kubernetes framework. When you deploy OAA, it contains information such as the cluster name in which it is running. Therefore, you must exclude this information when you replicate the persistent volumes.
Creating the Disaster Recovery Site From the Primary Site
On the primary site:
Parent topic: Creating the Disaster Recovery Site
Creating the Disaster Recovery Site on the Standby Site
On the standby site:
Parent topic: Creating the Disaster Recovery Site