Ensuring Data Consistency for Hitachi Universal Replicator in Asynchronous Mode (Sun Cluster Geographic Edition Data Replication Guide for Hitachi TrueCopy and Universal Replicator)

Sun Cluster Geographic Edition Data Replication Guide for Hitachi TrueCopy and Universal Replicator

Previous: How to Create and Configure a Hitachi TrueCopy or Universal Replicator Protection Group That Does Not Use Oracle Real Application Clusters
Next: Requirements to Support Oracle Real Application Clusters With Data Replication Software

Ensuring Data Consistency for Hitachi Universal Replicator in Asynchronous Mode

This section describes the protection group configuration that is required in the Sun Cluster Geographic Edition 3.2 11/09 software to guarantee data consistency in asynchronous mode replication. Asynchronous mode replication is implemented by using the async fence level of Hitachi Universal Replicator. The following discussion therefore applies only to the async fence level and to Hitachi Universal Replicator as implemented in the Sun Cluster Geographic Edition module.

Understanding Data Consistency in Sun Cluster Geographic Edition

With Sun Cluster 3.2 11/09 software, the Sun Cluster Geographic Edition module supports Hitachi TrueCopy and Universal Replicator device groups in asynchronous mode replication. Routine operations for both Hitachi TrueCopy and Universal Replicator provide data consistency in asynchronous mode. However, in the event of a temporary loss of communications or of a “rolling disaster” where different parts of the system fail at different times, only Hitachi Universal Replicator software can prevent loss of consistency of replicated data for asynchronous mode. In addition, Hitachi Universal Replicator software can only ensure data consistency with the configuration described in this section and in Configuring the /etc/horcm.conf File on the Nodes of the Primary Cluster and Configuring the /etc/horcm.conf File on the Nodes of the Secondary Cluster.

In Hitachi Universal Replicator software, the Hitachi storage arrays replicate data from primary storage to secondary storage. The application that produced the data is not involved. Even so, to guarantee data consistency, replication must preserve the application's I/O write ordering, regardless of how many disk devices the application writes.

During routine operations, Hitachi Universal Replicator software on the storage secondary array pulls data from cache on the primary storage array. If data is produced faster than it can be transferred, Hitachi Universal Replicator can commit backlogged I/O and a sequence number for each write to a journal volume on the primary storage array. The secondary storage array pulls that data from primary storage and commits it to its own journal volumes, from where it is transferred to application storage. If communications fail and are later restored, the secondary storage array begins to resynchronize the two sites by continuing to pull backlogged data and sequence numbers from the journal volume. Sequence numbers control the order in which data blocks are committed to disk so that write ordering is maintained at the secondary site despite the interruption. As long as journal volumes have enough disk space to record all data that is generated by the application that is running on the primary cluster during the period of failure, consistency is guaranteed.

In the event of a rolling disaster, where only some of the backlogged data and sequence numbers reach the secondary storage array after failures begin, sequence numbers determine which data should be committed to data LUNs to preserve consistency.

Note –

In the Sun Cluster Geographic Edition module with Hitachi Universal Replicator, journal volumes are associated with application storage in the /etc/horcm.conf file. That configuration is described in Journal Volumes and Configuring the /etc/horcm.conf File on the Nodes of the Primary Cluster. For information about how to configure journal volumes on a storage array, see the Hitachi documentation for that array.

Using Consistency Group IDs to Ensure Data Consistency

Along with journal volumes, consistency group IDs (CTGIDs) ensure data consistency even if the storage for an application data service includes devices in multiple Hitachi device groups. A CTGID is an integer that is assigned to one or more Hitachi device groups. It designates those devices that must be maintained in a state of replication consistent with each other. Consistency is maintained among all devices with the same CTGID whether the devices are members of a single Hitachi device group or several Hitachi device groups. For example, if Hitachi Universal Replicator stops replication on the devices of one device group that is assigned the CTGID of 5, it stops replication on all other devices in device groups with the CTGID of 5.

To ensure data consistency, an exact correspondence must therefore exist between the device groups that are used by a single application data service and a CTGID. All device groups that are used by a single data service must have the same unique CTGID. No device group can have that CTGID unless it is used by the data service.

To ensure this correspondence, the Sun Cluster Geographic Edition 3.2 11/09 software allows the administrator to set a CTGID property on each protection group. The device groups that are added to the protection group must all have the same CTGID as the protection group. If other device groups are assigned the same CTGID as the device groups in the protection group, the Sun Cluster Geographic Edition software generates an error. For example, if the protection group app1-pg has been assigned the CTGID of 5, all device groups included in app1-pg must have the CTGID of 5. Moreover, all CTGIDs of device groups that are included in app1-pg must have the CTGID of 5.

You are not required to set a CTGID on a protection group. The Hitachi storage software will automatically assign a unique CTGID to an asynchronously replicated device group when it is initialized. Thereafter, the pairs in that device group will be maintained in a state of consistency with each other. Thus, if an application data service in a protection group uses storage in just one asynchronously replicated Hitachi device group, you can let the Hitachi storage array assign the device group's CTGID. You do not have to also set the CTGID of the protection group.

Similarly, if you do not need data consistency, or if your application does not write asynchronously to your Hitachi device groups, then setting the CTGID on the protection group has little use. However, if you do not assign a CTGID to a protection group, any later configuration changes to the device group or to the protection group might lead to conflicts. Assignment of a CTGID to a protection group provides the most flexibility for later changes and the most assurance of device group consistency.

Configuring Consistency Group IDs for Hitachi Universal Replicator Device Groups in Asynchronous Mode

You can assign a consistency group ID (CTGID) to a protection group by setting the property ctgid=consistency-group-ID as an option to the geopg create command. You can assign CTGID values to device groups in one of two ways:

You can add uninitialized device groups to the protection group. They are initialized and acquire the CTGID of the protection group when the protection group is started with the geopg start command.
You can initialize a device group with the CTGID that you plan to use for the protection group that will hold that device group. After you create the protection group with that CTGID, you must assign the device group to it.

The following procedure demonstrates these two methods of setting the CTGID for the devices that are used by an application data service. The procedure configures a protection group named app1-pg with a CTGID of 5. This protection group contains the app1-rg resource group and the Hitachi Universal Replicator devgroup1 device group, which uses the async fence level.

Before You Begin

Configure a Hitachi Universal Replicator device group with journal volumes in the /etc/horcm.conf file as described inConfiguring the /etc/horcm.conf File on the Nodes of the Primary Cluster and Configuring the /etc/horcm.conf File on the Nodes of the Secondary Cluster.
Configure the devices in each device group as raw-disk devices or mirror them by using Veritas Volume Manager as described in How to Set Up Raw-Disk Device Groups for Sun Cluster Geographic Edition Systems or How to Configure Veritas Volume Manager Volumes for Use With Hitachi TrueCopy Replication.
Configure a Sun Cluster resource group that includes a resource of type HAStoragePlus in addition to any other resources that are required for its application data service. This HAStoragePlus resource must use the disk devices of a previously configured Hitachi Universal Replicator device group as described in How to Configure the Sun Cluster Device Group That Is Controlled by Hitachi TrueCopy or Universal Replicator Software and How to Configure a Highly Available File System for Hitachi TrueCopy or Universal Replicator Replication.

On the primary cluster, create the Sun Cluster Geographic Edition protection group with a specified CTGID, and add the resource group.

phys-paris-1# geopg create -s paris-newyork-ps -o primary -d truecopy -p ctgid=5 \
-p nodelist=phys-paris-1,phys-paris-2 app1-pg

phys-paris-1# geopg add-resource-group app1-rg app1-pg

Add device groups to the protection group by using one of the following methods:
- Add device groups that have been configured in the /etc/horcm.conf file but have not been initialized by using the paircreate command.
  phys-paris-1# geopg add-device-group -p fence_level=async devgroup1 app1-pg
- Assign CTGIDs to device groups when they are initialized by using the Hitachi paircreate command, and add the device groups to the protection group that has the same value for the CTGID property.
  
  In the following example, a device group is initialized with the CTGID of 5 and then added to the app1-pg protection group:
  phys-paris-1# paircreate -g devgroup1 -vl -f async 5
  phys-paris-1# geopg add-device-group -p fence_level=async devgroup1 app1-pg

Start the protection group.
phys-paris-1# geopg start -e local app1-pg
Uninitialized device groups, if any, are initialized and assigned the CTGID of 5.