|C H A P T E R 1|
Introduction to Remote Mirror Software
This chapter describes the following:
The Sun StorageTek Availability Suite Remote Mirror software is a volume-level replication facility for the Solaris OS. You can use this software to replicate disk volumes between physically separate primary and secondary hosts in real time.
As part of a disaster recovery and business continuance plan, the Remote Mirror software enables you to keep up-to-date copies of critical data at remote sites. You can also rehearse your data recovery strategy to fail over data to remote sites. Later, you can write any data changes that occurred to the original disk. To transport data, the Remote Mirror software uses any Sun network adapter that supports TCP/IP.
The Remote Mirror software is active while your applications are accessing the data volumes and continually replicates the data to remote sites. The software operates at the volume level on storage devices attached to one or more hosts.
You can also update the data on the secondary site volume by issuing a command to synchronize the primary and secondary site volumes. You can restore data from the secondary volume to the primary volume by issuing a command to reverse synchronize the volumes. Reverse synchronization is also known as performing a reverse update. Updates from the primary site to the secondary site are also known as forward resynchronizations.
TABLE 1-1 describes the Remote Mirror software features.
An I/O group is a collection of Remote Mirror software sets that have the same group name, primary and secondary interfaces, and mirroring mode. Mixed groups (those where mirroring modes are asynchronous for one set and synchronous for another set) are not allowed.
Replicate data from one primary volume to a secondary volume; the secondary volume then replicates the data again to another secondary volume, and so on, in daisy-chain fashion. See One-to-Many, Many-to-One, and Multihop Volume Sets.
Replicate data from one primary volume to many secondary volumes residing on one or more sites. When you perform a forward resynchronization, you can synchronize one volume set or all volume sets. Issue a separate command for each set. You can also update the primary volume using a specific secondary volume. See One-to-Many, Many-to-One, and Multihop Volume Sets.
You can include volumes containing critical data by specifying these volumes as part of a Remote Mirror software volume set. You can also exclude volumes with noncritical data from Remote Mirror software options.
Plan for disaster recovery and business continuance using physically separate primary and secondary sites. The Remote Mirror software design is link-neutral, meaning that it can use any Sun network adapter that supports TCP/IP.
The following enhanced features are available with the Sun StorageTek Availability Suite Remote Mirror software.
Data can be queued on disk as well as in memory. Memory-based queues are the default.
Disk-based queues enable:
If a disk-based queue fills up, the Remote Mirror software goes to non-blocking mode, known either as scoreboarding or logging mode. For further information see Managing Disk Queues.
Blocking mode is the default when operating the Remote Mirror software in asynchronous mode. Blocking mode ensures write ordering of the packets to the secondary site.
If the asynchronous queue fills up when the software is running in blocking mode, the response time to the application can be affected adversely. Write operations must be acknowledged prior to being removed from the queue, so they can prevent, or block, further write operations to the queue until space is available.
Non-blocking mode is optional in asynchronous operation of the Remote Mirror software. In non-blocking mode, if the asynchronous queue fills up, the queue is discarded and the Remote Mirror software goes into logging mode.
In logging mode, the bitmap is used to scoreboard writes. The application's writes are not blocked, but write ordering is lost during scoreboarding. However, the application sees no significant degradation in response time.
Do an update synchronization to synchronize data on the primary and secondary sites after filling of the queue and subsequent entry into logging mode.
The Remote Mirror software now has the ability to use multiple flusher threads to increase the drain rate from the asynchronous queues. This allows multiple I/Os per consistency group or set on the network at one time. The default number of queue-flushing threads is two.
If you want an operation that is similar to that of the Sun StorageTek Availability Suite 3.1 Remote Mirror software, set the flusher threads to one. You can increase the number of threads above two for enhanced performance in a low-latency network environment.
When using multiple threads, writes can often arrive at the secondary site out of sequence. To prevent any problems at the secondary site, sequence numbers are added to all data writes at the primary site. The secondary site manages the incoming data based on the sequence numbers. Write ordering is essentially restored at the secondary site. Writes that arrive out of order are stored in memory until previous writes arrive.
The use of multiple asynchronous flusher threads on the primary site requires more memory at the secondary site. Each group or ungrouped set that the secondary site is tracking can have a maximum of 64 requests pending and in memory at the secondary site. Memory requirements depend on the number of groups or sets tracked, the maximum request count of 64, and the size of the writes.
When the number of requests for a group or set reaches 64, the secondary site prevents the primary site from issuing any more requests for that group or set.
If memory is not available when a packet arrives, the packet is rejected and all groups and sets go into logging mode at the secondary site.
See Memory Requirements for further information.
The protocol of the Remote Mirror software has been improved to take advantage of the software's improved asynchronous flushing rate and its improved usage of network bandwidth.
The software protocol has been enhanced to work efficiently with the new disk-based asynchronous queues and the associated multiple flusher threads.
If possible, the Remote Mirror software combines, or coalesces, multiple sequential writes to the primary volume into a single network write. The size of the writes from the application and the network packet size affect the Remote Mirror software's ability to coalesce writes. Write coalescing provides two important advantages:
The core Remote Mirror software code is a kernel module that interfaces to the network storage control module (nsctl) framework. The software is configurable on any device that is accessible through the nsctl framework. The sndradm Command Line Interface (CLI) is the external user interface used to manage the Remote Mirror software.
FIGURE 1-1 shows the relationship between the storage volume (sv) driver, the Remote Mirror software, and the rest of the nsctl framework. I/O commands and data enter and exit the Remote Mirror software through the Sun StorageTek storage volume (sv) driver software. Mediated by nsctl, the data flows through the Remote Mirror software (and, optionally the Point-in-Time Copy software) and the storage device block cache (sdbc) drivers to its destination on the storage array or in user space.
The sv driver intercepts the I/O commands to the Remote Mirror volumes and routes them through the Sun StorageTek I/O stack to the storage device driver or the volume manager. The sv driver is a very thin layer in the I/O stack and operates by interposing commands onto the entry points for the underlying device driver.
The I/O commands originating in user space are intercepted at the top of the Sun StorageTek I/O stack. The sv driver routes them through the stack and feeds them back to the storage device driver or volume manager at the bottom of the stack. Data also flows in the opposite direction, from the storage back to user space. Because the Remote Mirror software is located at the top of the stack before the Point-in-Time copy software, if a Remote Mirror volume is a target of Point-in-Time Copy update or copy, the Remote Mirror volume sets must be placed in logging mode (sndradm -l) before performing point-in-time enable, copy, update, or reset operations on any volume that is also part of a Remote Mirror volume set.
If the volume set is not in logging mode, the point-in-time copy operation fails and the Remote Mirror software reports that the operation is denied. See also Using the Remote Mirror Software With the Point-in-Time Copy Software.
The Remote Mirror software is not a file or file system replicator. It is a volume replicator. When you replicate a volume on the primary site (Site-A) to a volume on the secondary site (Site-B), Site-B receives an exact block-for-block copy. Make sure that any Site-A file systems you want to replicate are mounted, and that on Site-B the replicated file systems are unmounted.
If a file system has data cached but not committed to disk, and has returned successfully, to the calling application just before a system failure occurs, data can be lost. To prevent this from happening, mount your file system with the forcedirectio option. Using this option has a significant impact on overall performance, so test your system to be sure that its use is necessary.
When replicating, the primary host file system is mounted. Do not mount the file system on Site-B until you are ready to fail over or write to that site's volume. Changes appear on a replicated file system volume only after mounting the volume.
A file system on Site-B can be mounted only in read-only mode while the volume set continues to replicate. Once Site-B volumes are placed into logging mode, the file system can be mounted for read and write operations.
The Remote Mirror software replicates data to volume sets that you define. A volume set consists of a primary volume residing on a local (primary) host and another volume residing on a remote (secondary) host. The volume set also includes a bitmap volume on each host that tracks write operations and differences between the volumes. See Customizing Volume Sets.
The secondary volumes can be updated synchronously in real time or asynchronously using a store-and-forward technique. Typically, a primary volume is first explicitly copied to a designated secondary volume to establish matching contents. As applications write to the primary volume, the Remote Mirror software replicates changes to the secondary volume, keeping the two images consistent.
In synchronous mode, a write operation is not confirmed as complete until the remote volume has been updated. In asynchronous mode, a write operation is confirmed as complete before the remote volume has been updated.
The size of the secondary volume must be equal to or greater than the corresponding primary volume. If you initiate a resynchronization on a volume set where the secondary volume is smaller than the primary, the software fails with an error. Refer to Setting Up Bitmap Volumes for more information about volume sizes.
Volumes are defined here as logical volumes that can be linear, striped, or RAID volumes. You can create logical volumes by using the Solaris Volume Manager or VERITAS Volume Manager software.
You can use Redundant Array of Independent Disks (RAID) volumes as part of your Remote Mirror software strategy. Volumes can be any RAID level. The RAID levels of volumes in a volume set do not have to match.
When selecting a volume to be used in a volume set (including the configuration location), ensure that the volume does not contain disk label private regions (for example, slice 2 on a Solaris OS-formatted volume). The disk label region is contained in the first sectors of a disk. To be safe, ensure that cylinder 0 is not part of any logical volume that is replicated.
Caution - When creating volume sets, do not create secondary or bitmap volumes using partitions that include cylinder 0, data loss might occur. See VTOC Information.
By default, the Remote Mirror and Point-in-Time copy software support a configuration of 4096 volumes and 64 Mbyte for caching. You can increase both amounts if system resources allow. The number of volumes allowed is divided between both software products. For example, if you use the Remote Mirror software only, you can have 2048 volume sets, each consisting of a primary/secondary volume and associated bitmap volume.
For more information, see Increasing the Default Number of Volumes Allowed.
The software enables you to group volume sets in I/O or Consistency groups. You can assign specific volume sets to an I/O group to perform replication on those volume sets and not on others you have configured. Grouping volume sets also guarantees write ordering: Write operations to the secondary volume occur in the same order as the write operations to the primary volume.
An I/O group is a collection of software sets that have the same group name, primary and secondary interfaces, and mirroring mode. Mixed groups (those where mirroring modes are asynchronous for one set and synchronous for another set) are not allowed.
Using an I/O group, you can issue a Remote Mirror software command that is executed on every member of the group. Volume sets can be controlled as a single unit.
I/O group operations are atomic. The change from replicating mode to logging mode is guaranteed to occur on every set in an I/O group and to fail on every set if it fails on a single set in the group.
The software maintains write ordering for volumes in a group to ensure that the data on each secondary volume is a consistent copy of the corresponding primary volume. See Order-Dependent Writes and Volume Set Grouping.
The auto resynchronization feature supports I/O grouping. It allows the feature to be enabled or disabled on a per-group basis and controls the resynchronization operation atomically on the group.
I/O grouping has an adverse affect on the Remote Mirror software asynchronous operation, as I/O flushing is reduced to a single thread. In this case, consider the size of the data to be transferred because all I/O is routed through a single queue.
You can also group volume sets according to their cluster or resource tag to perform replication in a clustered environment. The Remote Mirror software is cluster aware beginning in the Sun Cluster 3.0 Update 3 and Sun Cluster 3.1 environments, providing high availability (HA) for the Sun StorageTek software.
See the Sun Cluster and Sun StorageTek Availability Suite 4.0 Software Integration Guide for more information about configuring the Sun StorageTek Availability Suite software in a Sun Cluster environment.
This section discusses the Remote Mirror software and the possible memory requirement it places on the secondary host when using multiple asynchronous flusher threads.
The Remote Mirror software enables the number of asynchronous service threads to be set on a per-group basis, which allows multiple in-flight RPC requests and speeds servicing of the asynchronous queue. Enabling more than one RPC request creates the possibility that these requests can arrive out of order with respect to the order in which the writes were issued on the primary host. In other words, a request might arrive before a previous request has completed its I/O.
The write order must be maintained within a group. Therefore, out of order requests must be stored in memory on the secondary host until the missing request comes in and completes execution.
A maximum of 64 outstanding requests per group are stored on the secondary host, after which it stalls the primary host from issuing any more requests. This hard limit is applied only to the number of possible outstanding requests, not to the size of their payload. For example, if the I/O consists of 4Kbyte writes with 6 groups, the total memory requirement could be 4Kbyte x 6 x 64 = 1536 Kbyte. However, with an I/O size of 1 Mbyte, this could rise to 1Mbyte x 6 x 64 = 384 Mbyte.
The Remote Mirror software requires a TCP/IP connection between the primary and secondary hosts. A dedicated TCP/IP link is not required.
Although the Remote Mirror software is most likely to be used with SunATM link-level interfaces, the Remote Mirror software design is link-neutral, meaning that it can use any Sun network adapter that supports the TCP/IP protocol.
Each host must have the proper Asynchronous Transfer Method (ATM) or Ethernet hardware installed to support the TCP/IP link. The Remote Mirror software operates over any TCP/IP networking technology but has been qualified only on 10, 100, and 1000 Mbit Ethernet and SunATM155 and SunATM622 technologies.
When using ATM, ensure that the configuration supports TCP/IP by using either Classical IP or LAN Emulation mode.
The Remote Mirror software on both the primary and secondary nodes listens on a well-known port advertised in /etc/services. The default is port 121. Remote mirror write traffic flows from primary to secondary host over a socket with an arbitrarily assigned address on the primary host and the well-known address on the secondary host. The health-monitoring heartbeat travels over a different connection, with an arbitrarily assigned address on the secondary host and the well-known address on the primary host. The Remote Mirror protocol utilizes SUN RPCs over these connections.
Port 121 is the default TCP/IP port for use by the Remote Mirror sndrd daemon. To change the port number, edit the /etc/services file using a text editor.
Note - If you change the port number, you must change it on all Remote Mirror hosts within a configuration set (that is, primary and secondary hosts, or all hosts in one-to-many, many-to-one, and multihop configurations). In addition, you must restart the Remote Mirror data services using the dscfgadm -d -r command, followed by the dscfgadm -e -r command, on all affected hosts, so that the port number change can take effect.
Remote Mirror does not work with any nameserver service other than /etc/hosts. Place the hostnames of each host running the Remote Mirror software in the /etc/hosts file on each machine.
Because RPCs require an acknowledgment, the firewall must be opened to allow the well-known port address to be in either the source or destination fields of the packet.
In the case of write replication traffic, packets destined for the secondary host have the well-known port number in the destination field. Acknowledgments of these RPCs contain the well-known address in the source field.
For health monitoring, the heartbeat originates from the secondary host with the well-known address in the destination field. The acknowledgment contains this address in the source field.
If the option is available, be sure to configure the firewall to allow RPC traffic as well.
The Remote Mirror software enables you to create one-to-many, many-to-one, and multihop volume sets.
One-to-many replication enables you to replicate data from one primary volume to many secondary volumes residing on one or more hosts. The primary volume plus each secondary volume forms a single volume set. For example, with one primary and three secondary host volumes, you need to configure three volume sets: primary A and secondary B1, primary A and secondary B2, and primary A and secondary B3.
Many-to-one replication enables you to replicate volumes across more than two hosts through more than one network connection. The software supports the replication of volumes located on many different hosts to volumes on a single host. The terminology differs from the one-to-many configuration terminology, where the one and the many referred to are volumes.
Multihop replication indicates that the secondary host volume of one volume set acts as the primary host volume of another volume set (it is still the secondary volume of the first volume set). In the case of one primary host volume A and one secondary host volume B, the secondary host volume B appears as primary host volume A1 to the secondary host volume B1.
See One-to-Many Replication, Many-to-One Replication and Multihop Replication for more information about these scenarios.
Write ordering is maintained for groups of asynchronously replicating volume sets. (The general definition of write ordering is that write operations directed to the target volume occur in the same order as write operations to the source.) The group of target volumes is a copy of the group of source volumes.
This feature can be valuable in cases where you can avoid application requirements that limit operations. For example, a database application might limit partition sizes to no greater than 2 Gbyte. In this case, you might group volume sets to create a large virtual "volume" that preserves write operations. Otherwise, you risk having inconsistent data by trying to update volume sets individually instead of as a group.
When an application has multiple logical volumes assigned, application data integrity can be maintained by one of the following techniques:
If you use the Point-in-Time Copy software, the remote point-in-time snapshot is taken while the application is in the recoverable state. For example, most database applications allow for a hot backup. If a remote point-in-time copy was made of the entire replicated database while the primary host was in hot backup mode, a consistent remote database becomes available by using the point-in-time copy and the log files taken while the database was in hot backup mode.
Note - If a Remote Mirror volume is a target of Point-in-Time Copy update or copy, you must place the Remote Mirror volume set in logging mode for the point-in-time copy software to successfully perform an enable, copy, update, or reset operation on a Remote Mirror volume. If the volume set is not in logging mode, the point-in-time copy operation fails and the Remote Mirror software reports that the operation is denied.
Typically, volumes are replicated from local Site-A to remote Site-B. However, as applications are geographically distributed, a storage system at remote Site-B might be both a remote volume backup to local Site-A and also a direct storage resource for applications on Host-B. Under these circumstances, you can replicate Host-B volumes to Site-A.
This reciprocal backup arrangement supported by the Remote Mirror software is known as mutual backup or dual backup.
With mutual backup, the Remote Mirror software volumes considered primary by Site-B are administered from the Site-B session. Site-B replicated-volume devices are considered secondary volumes at Site-A.
In this case, you need to configure two unique volume sets for each site. For example:
See FIGURE 1-3.
The Solaris system administrator must be knowledgeable about the virtual table of contents (VTOC) that is created on raw devices by the Solaris operating system.
The creation and updating of a physical disk's VTOC is a standard function of the Solaris operating system. Software applications like Availability Suite, the growth of storage virtualization, and the appearance of SAN-based controllers have made it easy for an uninformed Solaris system administrator to inadvertently allow a VTOC to become altered. Altering the VTOC increases the possibility of data loss.
Remember these points about the VTOC:
When first configuring and validating volume replication, save copies of all affected devices' VTOCs using the prtvtoc(1M) utility. The fmthard(1M) utility can be used to restore them later, if necessary.