Setting Up Peering Between the ZFS Storage Appliances

After the physical connection between the ZFS Storage Appliances has been established, you set them up as peers using the drSetupService command in the Service CLI. You run this command from both systems so that each system operates as the replica of the other system.

The required replication parameters for standard storage are mandatory with the setup command. If Private Cloud Appliance systems also include high-performance storage, then add the replication parameters for the high-performance storage pool to the setup command.

However, only set up replication for high-performance storage if the high-performance storage pool is effectively available on the ZFS Storage Appliances. If not, run the setup command again to add the high-performance storage pool at a later time, after it has been configured on the ZFS Storage Appliances.

When you set up the replication interfaces for the disaster recovery service, the system assumes that the gateway is the first host address in the subnet of the local IP address you specify. This applies to the replication interface for standard storage and high-performance storage. For example, if you specify a local IP address as 10.50.7.31/23 and the gateway address is not 10.50.6.1 then you must add the gateway IP address to the drSetupService command using the gatewayIp and gatewayIpPerf parameters.

Optionally, you can also set a maximum number of DR configurations and a retention period for disaster recovery job details.

Setting Up Peering Between the ZFS Storage Appliances Before 302-b892153

If Oracle Private Cloud Appliance racks are running a version of software before release 302-b892153, follow these Service API steps to set up peering between the racks and the ZFS Storage Appliance.

Note:

Both Private Cloud Appliance racks in the disaster recovery configuration must be running the same version of the system software.

Syntax (entered on a single line):

drSetupService
localIp=<primary_system_standard_replication_ip> (in CIDR notation)
remoteIp=<replica_system_standard_replication_ip>
localIpPerf=<primary_system_performance_replication_ip> (in CIDR notation)
remoteIpPerf=<replica_system_performance_replication_ip>
[Optional Parameters:]
  gatewayIp=<local_subnet_gateway_ip> (default: first host IP in localIp subnet)
  gatewayIpPerf=<local_subnet_gateway_ip> (default: first host IP in localIpPerf subnet)
  maxConfig=<number_DR_configs> (default and maximum is 20)
  jobRetentionHours=<hours> (default and minimum is 24)

Examples:

  • With only standard storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.31/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.33

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.33/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.31
  • With both standard and high-performance storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.31/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.33 \
    localIpPerf=10.50.7.32/23 gatewayIpPerf=10.50.7.10 remoteIpPerf=10.50.7.34

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.33/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.31 \
    localIpPerf=10.50.7.34/23 gatewayIpPerf=10.50.7.10 remoteIpPerf=10.50.7.32

Important:

When setting up disaster recovery, after you run drSetupService on the first system you must wait for the job to complete before running the command on the second system. You can monitor the job on the first system by running drGetJob jobid=<unique-id>.

The script configures both ZFS Storage Appliances.

After successful configuration of the replication interfaces, you must enable replication over the interfaces you just configured.

Enabling Replication for Disaster Recovery

To enable replication between the two storage appliances, using the interfaces you configured earlier, re-run the same drSetupService command from the Service CLI, but this time followed by enableReplication=True. You must also provide the remotePassword to authenticate with the other storage appliance and complete the peering setup.

Examples:

  • With only standard storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.31/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.33 \
    enableReplication=True remotePassword=********

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.33/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.31 \
    enableReplication=True remotePassword=********
  • With both standard and high-performance storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.31/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.33 \
    localIpPerf=10.50.7.32/23 gatewayIpPerf=10.50.7.10 remoteIpPerf=10.50.7.34 \
    enableReplication=True remotePassword=********

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.50.7.33/23 gatewayIp=10.50.7.10 remoteIp=10.50.7.31 \
    localIpPerf=10.50.7.34/23 gatewayIpPerf=10.50.7.10 remoteIpPerf=10.50.7.32 \
    enableReplication=True remotePassword=********

Important:

When enabling replication, after you run drSetupService on the first system you must wait for the job to complete before running the command on the second system. You can monitor the job on the first system by running drGetJob jobid=<unique-id>.

At this stage, the ZFS Storage Appliances in the disaster recovery setup have been successfully peered. The storage appliances are ready to perform scheduled data replication every 5 minutes. The data to be replicated is based on the DR configurations you create. See Managing Disaster Recovery Configurations.

Modifying the ZFS Storage Appliance Peering Setup

After you set up the disaster recovery service and enabled replication between the systems, you can change the parameters of the peering configuration. You change the service using the drUpdateService command in the Service CLI.

Syntax (entered on a single line):

drUpdateService
localIp=<primary_system_standard_replication_ip> (in CIDR notation)
remoteIp=<replica_system_standard_replication_ip>
localIpPerf=<primary_system_performance_replication_ip> (in CIDR notation)
remoteIpPerf=<replica_system_performance_replication_ip>
gatewayIp=<local_subnet_gateway_ip> (default: first host IP in localIp subnet)
gatewayIpPerf=<local_subnet_gateway_ip> (default: first host IP in localIpPerf subnet)
maxConfig=<number_DR_configs> (default and maximum is 20)
jobRetentionHours=<hours> (default and minimum is 24)

Example 1 – Simple parameter change

This example shows how you change the job retention time from 24 to 48 hours and reduce the maximum number of DR configurations from 20 to 12.

PCA-ADMIN> drUpdateService jobRetentionHours=48 maxConfig=12
Command: drUpdateService jobRetentionHours=48 maxConfig=12
Status: Success
Time: 2022-08-11 09:20:48,570 UTC
Data:
  Message = Successfully started job to update DR admin service
  Job Id = ec64cef4-ba68-493d-89c8-22df51553cd8

Use the drShowService command to check the current configuration. Run the command to display the configuration parameters before you change them. Run it again afterward to confirm that the changes have been applied successfully.

PCA-ADMIN> drShowService
Command: drShowService
Status: Success
Time: 2022-08-11 09:23:54,951 UTC
Data:
  Local Ip = 10.50.7.31/23
  Remote Ip = 10.50.7.33
  Replication = ENABLED
  Replication High = DISABLED
  Message = Successfully retrieved site configuration
  maxConfig = 12
  gateway IP = 10.50.7.10
  Job Retention Hours = 48

Example 2 – Replication IP change

There might be network changes in the data center that require you to use different subnets and IP addresses for the replication interfaces configured in the disaster recovery service. This configuration change must be applied in many commands on the two peer systems, and in a specific order. If the systems contain both standard and high-performance storage – as in the example that follows –, change the replication interface settings for both storage types in the same order.

  1. Update the local IP and gateway parameters on system 1. Leave the remote IPs unchanged.

    PCA-ADMIN> drUpdateService \
    localIp=10.100.33.83/28 gatewayIp=10.100.33.81 \
    localIpPerf=10.100.33.84/28 gatewayIpPerf=10.100.33.81
  2. Update the local IP, gateway, and remote IP parameters on system 2.

    PCA-ADMIN> drUpdateService \
    localIp=10.100.33.88/28 gatewayIp=10.100.33.81 remoteIp=10.100.33.83 \
    localIpPerf=10.100.33.89/28 gatewayIpPerf=10.100.33.81 remoteIpPerf=10.100.33.84
  3. Update the remote IP parameters on system 1.

    PCA-ADMIN> drUpdateService \
    remoteIp=10.100.33.88 remoteIpPerf=10.100.33.89

Example 3 – Trusting a New ZFS Storage Appliance Certificate

The following example shows the command that must be run if the ZFS Storage Appliance certificate on the peer rack is updated. This command retrieves the new certificate from the remote host and adds it to the trust list,

PCA-ADMIN> drUpdateService \
remoteIp=s10.100.33.88 remoteIpPerf=10.100.33.89

Unconfiguring the ZFS Storage Appliance Peering Setup

If a reset has been performed on one or both of the systems in the disaster recovery solution, and you need to unconfigure the disaster recovery service to remove the entire peering setup between the ZFS Storage Appliances, use the drDeleteService command in the Service CLI.

Caution:

This command requires no other parameters. Be careful when entering it at the PCA-ADMIN> prompt, to avoid executing it unintentionally.

You can't unconfigure the disaster recovery service while DR configurations still exist. Proceed as follows:

  1. Remove all DR configurations from the two systems that have been configured as replicas for each other.

  2. Sign in to the Service CLI on one of the systems and enter the drDeleteService command.

  3. Sign in to the Service CLI on the second system and enter the drDeleteService command there as well.

When the disaster recovery service isn't configured, the drShowService command returns an error.

PCA-ADMIN> drShowService
Command: drShowService
Status: Failure
Time: 2022-08-11 12:31:22,840 UTC
Error Msg: PCA_GENERAL_000001: An exception occurred during processing: Operation failed. 
[...]
Error processing dr-admin.service.show response: dr-admin.service.show failed. Service not set up.

Setting Up Peering Between the ZFS Storage Appliances

If Oracle Private Cloud Appliance racks are running software release 302-b892153 or later, follow these Service API steps to set up peering between the racks and the ZFS Storage Appliance.

Note:

Both Private Cloud Appliance racks in the disaster recovery configuration must be running the either both earlier than build 302-b892153 or both build 302-b892153 or later.

Before beginning, the show netNetworkConfig output must have valid entries for the following:

  • DNS IP addresses
  • Management Node Hostnames
  • Management Node IP Addresses
  • Free Public IP Addresses
  • A Valid IP address for the ZFS CapacityPool Replication Endpoint

You must add PTR entries for DNS:
  • sn01-dr1.<rack_name><domain_name>
  • sn02-dr1.<rack_name><domain_name> (if you use a Performance Pool)

For DNS mapping configured with the zone delegation option, these DNS mappings are managed by Private Cloud Appliance DNS.

To populate the rack core DNS, edit the network configuration:

  • system 1

    PCA-ADMIN> edit networkConfig \
    zfsCapacityPoolReplicationEndpoint=10.0.7.31
  • system 2

    PCA-ADMIN> edit networkConfig \
    zfsCapacityPoolReplicationEndpoint=10.0.7.32

For DNS mapping configured with the manual option, these DNS mappings are managed by the data center DNS.

For more information on creating Private Cloud Appliance DNS PTR entries, and DNS management in general, see "Working with Zone Records" in the Networking chapter of the Oracle Private Cloud Appliance User Guide.

Syntax (entered on a single line):

drSetupService
localIp=<primary_system_standard_replication_ip> (in CIDR notation)
remoteHost=<replica_system_standard_replication_fqdn_for_remoteHost>
localIpPerf=<primary_system_performance_replication_ip> (in CIDR notation)
remoteHostPerf=<replica_system_performance_replication_fqdn_for_remoteHostPerf>
[Optional Parameters:]
  gatewayIp=<local_subnet_gateway_ip> (default: first host IP in localIp subnet)
  gatewayIpPerf=<local_subnet_gateway_ip> (default: first host IP in localIpPerf subnet)
  maxConfig=<number_DR_configs> (default and maximum is 20)
  jobRetentionHours=<hours> (default and minimum is 24)

Examples:

  • With only standard storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.31/23 gatewayIp=10.0.7.10 remoteHost=sn01-dr1.rack1,example.com

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.33/23 gatewayIp=10.0.7.10 remoteHost=sn01-dr1.rack2.example.com
  • With both standard and high-performance storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.31/23 gatewayIp=10.0.7.10 remoteHost=sn01-dr1.rack1.example.com \
    localIp=10.0.7.32/23 gatewayIp=10.0.7.10 remoteHostPerf=sn02-dr1.rack1.example.com

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.33/23 gatewayIp=10.50.7.10 remoteHost=sn01-dr1.rack2.example.com \
    localIpPerf=10.0.7.34/23 gatewayIpPerf=10.0.7.10 remoteHostPerf=sn02-dr1.rack2.example.com

Important:

When setting up disaster recovery, after you run drSetupService on the first system you must wait for the job to complete before running the command on the second system. You can monitor the job on the first system by running drgetjob jobid=<unique-id>.

For example:

PCA-ADMIN> drgetjob jobid=<unique-id> 
Command: drgetjob jobid=<unique-id> 
Status: Success
Time: 2023-08-01 15:26:46,973 UTC
Data:
  Type = setup service
  Job Id = <unique-id>
  Status = Success
  Start Time = 2023-08-01 15:26:28.935479
  Message = job successfully retrieved 

Note:

Ensure that the "Success" status message appears in the Data fields and not only the Command field.

The script configures both ZFS Storage Appliances.

After successful configuration of the replication interfaces, you must enable replication over the interfaces you configured.

Enabling Replication for Disaster Recovery

To enable replication between the two storage appliances, using the interfaces you configured earlier, run the same drSetupService command from the Service CLI, but this time followed by enableReplication=True. You must also provide the remotePassword to authenticate with the other storage appliance and complete the peering setup.

Examples:

  • With only standard storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.31/23 gatewayIp=10.0.7.10 \
    enableReplication=True remotePassword=******** remoteHost=sn01-dr1.rack2.example.com

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.33/23 gatewayIp=10.0.7.10 \
    enableReplication=True remotePassword=******** remoteHost=sn01-dr1.rack1.example.com
  • With both standard and high-performance storage configured:

    system 1

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.31/23 gatewayIp=10.0.7.10 remoteHost=sn01-dr1.rack2.example.com \
    localIpPerf=10.0.7.32/23 gatewayIpPerf=10.0.7.10 remoteHostPerf=sn02-dr1.rack2.example.com \
    enableReplication=True remotePassword=******** 

    system 2

    PCA-ADMIN> drSetupService \
    localIp=10.0.7.33/23 gatewayIp=10.0.7.10 remoteHost=sn01-dr1.rack1.example.com \
    localIpPerf=10.0.7.34/23 gatewayIpPerf=10.0.7.10 remoteHostPerf=sn02-dr1.rack1.example.com \
    enableReplication=True remotePassword=******** 

Important:

When enabling replication, after you run drSetupService on the first system you must wait for the job to complete before running the command on the second system. You can monitor the job on the first system by running drGetJob jobid=<unique-id>.

At this stage, the ZFS Storage Appliances in the disaster recovery setup have been successfully peered. The storage appliances are ready to perform scheduled data replication every 5 minutes. The data to be replicated is based on the DR configurations you create. See Managing Disaster Recovery Configurations.

Modifying the ZFS Storage Appliance Peering Setup

After you set up the disaster recovery service and enabled replication between the systems, you can change the parameters of the peering configuration individually. You change the service using the drUpdateService command in the Service CLI.

Syntax (entered on a single line):

drUpdateService
localIp=<primary_system_standard_replication_ip> (in CIDR notation)
remoteHost=<replica_system_standard_replication_fqdn>
localIpPerf=<primary_system_performance_replication_ip> (in CIDR notation)
remoteHostPerf=<replica_system_performance_replication_fqdn>
gatewayIp=<local_subnet_gateway_ip> (default: first host IP in localIp subnet)
gatewayIpPerf=<local_subnet_gateway_ip> (default: first host IP in localIpPerf subnet)
maxConfig=<number_DR_configs> (default and maximum is 20)
jobRetentionHours=<hours> (default and minimum is 24)

Example 1 – Simple parameter change

This example shows how you change the job retention time from 24 to 48 hours and reduce the maximum number of DR configurations from 20 to 12.

PCA-ADMIN> drUpdateService jobRetentionHours=48 maxConfig=12
Command: drUpdateService jobRetentionHours=48 maxConfig=12
Status: Success
Time: 2022-08-11 09:20:48,570 UTC
Data:
  Message = Successfully started job to update DR admin service
  Job Id = ec64cef4-ba68-493d-89c8-22df51553cd8

Use the drShowService command to check the current configuration. Run the command to display the configuration parameters before you modify them. Run it again afterward to confirm that your changes have been applied successfully.

PCA-ADMIN> drShowService
Command: drShowService
Status: Success
Time: 2022-08-11 09:23:54,951 UTC
Data:
  Local Ip = 10.0.7.31/23
  Remote Host = sn01-dr1.exmaple.com
  Replication = ENABLED
  Replication High = DISABLED
  Message = Successfully retrieved site configuration
  maxConfig = 12
  gateway IP = 10.0.7.10
  Job Retention Hours = 48

Example 2 – Replication IP change

There might be network changes in the data center that require you to use different subnets and IP addresses for the replication interfaces configured in the disaster recovery service. This configuration change must be applied in several commands on the two peer systems, and in a specific order. If the systems contain both standard and high-performance storage – as in the example following – change the replication interface settings for both storage types in the same order.

  1. Update the replication endpoint parameters on system 1.

    PCA-ADMIN> edit networkConfig zfsCapacityPoolReplicationEndpoint=10.100.3.88 \
    zfsPerfPoolReplicationEndpoint=10.100.3.89
  2. Update the local IP and gateway parameters on system 1. Leave the remote IPs unchanged.

    PCA-ADMIN> drUpdateService \
    localIp=10.100.3.83/28 gatewayIp=10.100.3.81 \
    localIpPerf=10.100.3.84/28 gatewayIpPerf=10.100.3.81
  3. Update the replication endpoint parameters on system 2.

    PCA-ADMIN> edit networkConfig zfsCapacityPoolReplicationEndpoint=10.100.3.88 \
    zfsPerfPoolReplicationEndpoint=10.100.3.89
  4. Update the local IP, gateway, and remote host parameters on system 2.

    PCA-ADMIN> drUpdateService \
    localIp=10.100.3.88/28 gatewayIp=10.100.3.81 remoteHost=sn01-dr1.rack1.example.com \
    localIpPerf=10.100.3.89/28 gatewayIpPerf=10.100.3.81 remoteHostPerf=sn02-dr1.rack1.example.com

Example 3 – Configuration Without Performance Pool

The following example applies these four commands to a configuration using only the basic pool and not the performance pool.

  1. Update the replication endpoint parameters on system 1.

    PCA-ADMIN> edit networkConfig zfsCapacityPoolReplicationEndpoint=10.16.9.43 
    Command: edit networkConfig zfsCapacityPoolReplicationEndpoint=10.16.9.43
    Status: Success
    Time: 2023-08-16 12:08:30,585 UTC
    JobId: 175b1600-eabe-4a0f-aa45-xxxxxx65599c1
  2. Update the local IP parameters on system 1. Leave the remote IPs unchanged. Check that job has finished successfully.

    PCA-ADMIN> drUpdateService localIp=10.16.9.43/12
    Command: drUpdateService localIp=10.16.9.43/12
    Status: Success
    Time: 2023-08-16 12:09:45,137 UTC
    Data:
      Message = Successfully started job to update DR admin service
      Job Id = 2844b731-f53c-4d92-850d-xxxxx22b49e3
    
    PCA-ADMIN> drgetJob jobId=2844b731-f53c-4d92-850d-xxxxx22b49e3
    Command: drgetJob jobId=2844b731-f53c-4d92-850d-xxxxx22b49e3
    Status: Success
    Time: 2023-08-16 12:15:19,560 UTC
    Data:
      Type = update_service
      Job Id = 2844b731-f53c-4d92-850d-xxxxx22b49e3
      Status = finished
      Start Time = 2023-08-16 12:09:45.017743
      End Time = 2023-08-16 12:15:19.443415
      Result = success
      Message = job successfully retrieved
      Response = Successfully updated DR service
    
  3. Update the replication endpoint parameters on system 2.

    PCA-ADMIN> edit networkConfig zfsCapacityPoolReplicationEndpoint=10.16.11.43
    Command: edit networkConfig zfsCapacityPoolReplicationEndpoint=10.16.11.43
    Status: Success
    Time: 2023-08-16 12:22:36,218 UTC
    JobId: b7bff723-0237-4a11-9d08-xxxxxd166e1d
  4. Update the local IP parameters on system 2. Leave the remote IPs unchanged. Check that job has finished successfully.

    PCA-ADMIN> drUpdateService localIp=10.16.11.43/12
    Command: drUpdateService localIp=10.16.11.43/12
    Status: Success
    Time: 2023-08-16 12:24:54,882 UTC
    Data:
      Message = Successfully started job to update DR admin service
      Job Id = 1d6826ac-04db-49f9-aa27-35996f69410a
    
    PCA-ADMIN> drgetjob jobId=1d6826ac-04db-49f9-aa27-xxxxx69410a
    Command: drgetjob jobId=1d6826ac-04db-49f9-aa27-xxxxxf69410a
    Status: Success
    Time: 2023-08-16 12:31:55,828 UTC
    Data:
      Type = update_service
      Job Id = 1d6826ac-04db-49f9-aa27-xxxxxf69410a
      Status = finished
      Start Time = 2023-08-16 12:24:54.655686
      End Time = 2023-08-16 12:30:16.461914
      Result = success
      Message = job successfully retrieved
      Response = Successfully updated DR service

Example 4 – Trusting a New ZFS Storage Appliance Certificate

The following example shows the command that must be run if the ZFS Storage Appliance certificate on the peer rack is updated. This command retrieves the new certificate from the remote host and adds it to the trust list,

PCA-ADMIN> drUpdateService \
remoteHost=sn01-dr1.rack1.example.com remoteHostPerf=sn02-dr1.rack1.example.com

Unconfiguring the ZFS Storage Appliance Peering Setup

If a reset has been performed on one or both of the systems in a disaster recovery solution, and you need to unconfigure the disaster recovery service to remove the entire peering setup between the ZFS Storage Appliances, use the drDeleteService command in the Service CLI.

Caution:

This command requires no other parameters. Be careful when entering it at the PCA-ADMIN> prompt, to avoid executing it unintentionally.

You cannot unconfigure the disaster recovery service while DR configurations still exist. Proceed as follows:

  1. Remove all DR configurations from the two systems that have been configured as replicas for each other.

  2. Log in to the Service CLI on one of the systems and enter the drDeleteService command.

  3. Log in to the Service CLI on the second system and enter the drDeleteService command there as well.

When the disaster recovery service isn't configured, the drShowService command returns an error.

PCA-ADMIN> drShowService
Command: drShowService
Status: Failure
Time: 2022-08-11 12:31:22,840 UTC
Error Msg: PCA_GENERAL_000001: An exception occurred during processing: Operation failed. 
[...]
Error processing dr-admin.service.show response: dr-admin.service.show failed. Service not set up.