This chapter describes data replication technologies you can use with Sun Cluster software.Sun Cluster software supports data replication between clusters (for disaster recovery) or within a cluster (as a replacement for host-based mirroring). Data replication is defined as copying data from a primary storage device to a backup or secondary device. If the primary device fails, your data is available from the secondary device. Data replication helps assure high availability and disaster tolerance for your cluster.
You must understand both host-based and storage-based data replication before you can select the replication approach that best serves your cluster. You can use Sun Cluster Geographic Edition to manage your data replication; see the Sun Cluster Geographic Edition Overview for more information.
This chapter contains the following sections:
Sun Cluster supports the following approaches to data replication:
Host-based data replication uses software to replicate disk volumes between geographically dispersed clusters in real time. Remote mirror replication enables data from the master volume of the primary cluster to be replicated to the master volume of the geographically dispersed secondary cluster. A remote mirror bitmap tracks differences between the master volume on the primary disk and the master volume on the secondary disk. Examples of host-based replication software used for replication between clusters (and between a cluster and a host that is not in a cluster) include Sun StorageTek Availability Suite 4 and Sun StorEdge Availability Suite 3.2.1.
Host-based data replication is a less expensive data replication solution because it uses host resources, rather than special storage arrays. Databases, applications, or file systems that are configured to allow multiple hosts running the Solaris OS to write data to a shared volume are not supported (for example, Oracle 9iRAC and Oracle Parallel Server). For more information about using host-based data replication between two clusters, see Sun Cluster Geographic Edition Data Replication Guide for Sun StorageTek Availability Suite. To see an example of host-based replication that does not use Sun Cluster Geographic Edition, see Appendix A, Configuring Host-Based Data Replication With Sun StorEdge Availability Suite or Sun StorageTek Availability Suite Software.
Storage-based data replication uses software on the storage controller to move the work of data replication off the cluster nodes and onto the storage device. This software frees some node processing power to serve cluster requests. Examples of storage-based software that can replicate data inside a cluster or between clusters include Hitachi TrueCopy and EMC SRDF. Storage-based data replication can be especially important in campus cluster configurations and can simplify the infrastructure required. For more information about using storage-based data replication in a campus cluster environment, see Using Storage-Based Data Replication Within a Cluster.
For more information about using storage-based replication between two or more clusters and the Sun Cluster GeoEdition product that automates the process, see Sun Cluster Geographic Edition Data Replication Guide for Hitachi TrueCopy and Sun Cluster Geographic Edition Data Replication Guide for EMC Symmetrix Remote Data Facility. See also the Configuring Host-Based Data Replication With Sun StorEdge Availability Suite or Sun StorageTek Availability Suite Software for a complete example of this type of cluster configuration.
Sun Cluster software supports the following methods of data replication between clusters or within a cluster:
Replication Between Clusters – For disaster recovery, you can use host-based or storage-based replication to perform data replication between clusters. Generally, you would choose either host-based replication or storage-based replication, rather than a combination of the two. You can manage both types of replication with Sun Cluster Geographic Edition software.
Host-Based Replication
Sun StorageTek Availability Suite 4, starting with the Solaris 10 OS
Sun StorEdge Availability Suite 3.2.1 on the Solaris 9 OS
In this manual, references to Sun StorageTek Availability Suite software also apply to Sun StorEdge Availability Suite software unless specifically stated otherwise.
If you want to use host-based replication without Sun Cluster Geographic Edition software, see the instructions in Appendix A, Example, Configuring Host-Based Data Replication With Sun StorEdge Availability Suite or Sun StorageTek Availability Suite Software.
Storage-Based Replication
Hitachi TrueCopy, through the Sun Cluster Geographic Edition
EMC Symmetrix Remote Data Facility (SRDF), through the Sun Cluster Geographic Edition
If you want to use storage-based replication without Sun Cluster Geographic Edition software, see the documentation for your replication software.
Replication Within a Cluster – This method is used as a replacement for host-based mirroring.
Application-Based Replication – Oracle Data Guard is an example of application-based replication software. This type of software is used only for disaster recovery. For more information, see the Sun Cluster Geographic Edition Data Replication Guide for Oracle Data Guard
Storage-based data replication uses software installed on the storage device to manage the replication within a cluster or a campus cluster. Such software is specific to your particular storage device, and is not used for disaster recovery. Refer to the documentation that shipped with your storage device when configuring storage-based data replication.
Depending on the software you use, you can use either automatic or manual failover with storage-based data replication. Sun Cluster supports both manual and automatic failover of the replicants with Hitachi TrueCopy and EMC SRDF software.
This section describes storage-based data replication as used in a campus cluster. Figure 4–1 shows a sample two-room configuration where data is replicated between two storage arrays. In this configuration, the primary storage array is contained in the first room, where it provides data to the nodes in both rooms. The primary storage array also provides the secondary storage array with data to replicate.
Figure 4–1 illustrates that the quorum device is on an unreplicated volume. A replicated volume cannot be used as a quorum device.
Storage-based data replication with Hitachi TrueCopy can be performed synchronously or asynchronously in the Sun Cluster environment, depending on the type of application you use. If you want to perform automatic failover in a campus cluster, use TrueCopy synchronously. Storage-based synchronous replication with EMC SRDF is supported with Sun Cluster; asynchronous replication is not supported for EMC SRDF.
To ensure data integrity, use multipathing and the proper RAID package. The following list includes considerations for implementing a cluster configuration that uses storage-based data replication.
Node-to-node distance is limited by the Sun Cluster Fibre Channel and interconnect infrastructure. Contact your Sun service provider for more information about current limitations and supported technologies.
Do not configure a replicated volume as a quorum device. Locate any quorum devices on a shared, unreplicated volume or use the quorum server.
Ensure that only the primary copy of the data is visible to cluster nodes. Otherwise, the volume manager might try to simultaneously access both primary and secondary copies of the data. Refer to the documentation that was shipped with your storage array for information about controlling the visibility of your data copies.
EMC SRDF and Hitachi TrueCopy allow the user to define groups of replicated devices. The replication device group and Sun Cluster global device group must be given the same name so that they may be moved between nodes as a single unit.
Particular application-specific data might not be suitable for asynchronous data replication. Use your understanding of your application's behavior to determine how best to replicate application-specific data across the storage devices.
If configuring the cluster for automatic failover, use synchronous replication.
For instructions on configuring the cluster for automatic failover of replicated volumes, see Administering Storage-Based Replicated Devices.
Oracle Real Application Clusters (RAC) is not supported with SRDF and Hitachi TrueCopy when replicating within a cluster. Nodes connected to replicas that are not currently the primary replica will not have write access. Any scalable application that requires direct write access from all nodes of the cluster cannot be supported with replicated devices.
Veritas Cluster Volume Manager (CVM) and Solaris Volume Manager (SVM) OBAN cluster for Sun Cluster software are not supported.
As with all campus clusters, those clusters that use storage-based data replication generally do not need intervention when they experience a single failure. However, if you are using manual failover and you lose the room that holds your primary storage device (as shown in Figure 4–1), problems arise in a two–node cluster. The remaining node cannot reserve the quorum device and cannot boot as a cluster member. In this situation, your cluster requires the following manual intervention:
Your Sun service provider must reconfigure the remaining node to boot as a cluster member.
You or your Sun service provider must configure an unreplicated volume of your secondary storage device as a quorum device.
You or your Sun service provider must configure the remaining node to use the secondary storage device as primary storage. This reconfiguration might involve rebuilding volume manager volumes, restoring data, or changing application associations with storage volumes.
When setting up device groups that use the Hitachi TrueCopy software for storage-based data replication, observe the following practices:
Use synchronous replication to avoid the possibility of lost data if the primary site fails.
A one-to-one relationship should exist between the Sun Cluster global device group and the TrueCopy replication group defined in the horcm configuration file. This allows both groups to move from node to node as a single unit.
Global file system volumes and failover file system volumes cannot be mixed in the same replicated device group because they are controlled differently. Global file systems are controlled by a Device Configuration System (DCS), while failover file system volumes are controlled by HAS+. The primary for each could be a different node, causing conflicts on which node should be the replication primary.
All RAID manager instances should be up and running at all times.
When using EMC SRDF software for storage-based data replication, use dynamic devices instead of static devices. Static devices require several minutes to change the replication primary and can impact failover time.