C H A P T E R 4 |
Remote Mirror Software |
This chapter discusses Remote Mirror software troubleshooting issues.
The following topics are included:
This section describes some common errors that you may encounter when using Remote Mirror software.
For information on how to safeguard the VTOC, refer to Safeguarding the Solaris VTOC.
If the secondary Remote Mirror set has not been enabled, the application gives the following error:
If the remote volume and host names do not match, both instances of SNDR will start, but they will not communicate with each other and replication will be unable to begin. The same message as when the secondary has not been enabled will be seen, but sndradm on the remote node will apparently show the set enabled. It is only on careful inspection that a difference in volume names can be seen to explain the failure.
The most common class of user errors when using the Remote Mirror software is accessibility issues in the specification of the primary host volume and bitmap, secondary host volume and bitmap, or the primary and secondary host names, configured using the sndradm utility. The best means to resolve these types of errors is to use standard Solaris utilities, specifically format(1M), prtvtoc(1M), dd(1M), and telnet(1M).
A typical sndradm enable command using Solaris RAW devices is as follows:
sndradm -e hostA /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 \hostB /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 ip sync |
A failure of this command may be due to incorrect device specifications, incorrect partition sizing, failure to access the device from this Solaris node, or Solaris host names. Resolving the issue using the following seven commands should be the first step towards resolving accessibility problems.
There is no requirement that primary host volume names match the secondary host
volume names, as long as the secondary volume is the same size or greater.
The next class of user errors when using the Remote Mirror software are perceived functionality issues. The functionality of the Remove Mirror software is to continuously copy all the data from the primary host's volume to the secondary host's volume, repeatedly, until either replication is stopped or the primary or secondary hosts are no longer available. The first command and the following six commands are essentially equivalent for the setup of a Remote Mirror replication set, except that the second set can take hours or days to complete, as it recopies already copied data.
# sndradm -e hostA /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 \ hostB /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 ip sync |
If the replication functionality of the first command listed above does not work as expected, use this set of commands with very small volumes to assure that the replication functionality that is desired works as expected for the configuration of volumes and hostname pertaining to your specific operating environment.
#!/bin/csh # repeat: # rsh hostA dd if=/dev/rdsk/c0t1d0s0 of=/tmp/hostA.tmp # rsh hostA rcp /tmp/hostA.tmp hostB:/tmp/hostA.tmp # rsh hostB dd of=/dev/rdsk/c0t1d0s0 if=/tmp/hostA.tmp # goto repeat |
When a Remote Mirror set is first enabled, the secondary volume may take hours or days to complete the initial synchronization, which is highly dependent on the volume size, network bandwidth and latency, and system resources on both the primary and secondary nodes. Review the Sun StorageTek Availability Suite 4.0 Remote Mirror Software Administration Guide for various methods which incorporate the use of sndradm -E for fast enable operations.
Once the initial full synchronization has completed, the Remote Mirror secondary volume is kept in write-order consistency, an operation which may lag the Remote Mirror primary volume. If at any time the replication process stops, logging mode is enabled, the network link does down, or there is a system failure, a replicated I/O operation may have been in progress. This state may result in a Remote Mirror secondary volume data set that appears inconsistent, meaning that utilities like fsck(1M), database recovery tools, or similar software may have to make indeterminate decisions about the validity of an incomplete I/O operation. The means by which the Remote Mirror software keeps a primary and secondary replicated set in write-order consistency results in the same I/O consistency issues as a Solaris node "panicking" while I/O is in progress.
If you are manually placing a Remote Mirror primary volume in logging mode to use the secondary volume, it is highly recommended that the primary volume be quiesced and all cached data blocks flushed to disk, so that the Remote Mirror software finishes replicating a consistent volume to the secondary host.
This section discusses configuration issues for the Remote Mirror software.
Set status can be checked with the sndradm -P command. The percentage of the primary that needs to be transmitted to the secondary to complete a sync operation can be seen with the dsstat -m sndr command.
The file /var/adm/ds.log contains a record of Availability Suite activity, including which remote replication sets have been enabled, resumed, and stopped by the sndradm and sndrboot utilities.
The following command creates a Remote Mirror replicated set consisting of raw partitions, where the primary is /dev/rdsk/c7t0d0s6 and the bitmap is /dev/rdsk/c7t1d0s6. Note the exact same command must be issued on both the primary and secondary host to complete a single Remote Mirror replicated set.
# sndradm -e hostA /dev/rdsk/c7t0d0s6 /dev/rdsk/c7t1d0s6 hostB \/dev/rdsk/c7t0d0s6 /dev/rdsk/c7t1d0s6 ip async |
Since this is an asynchronous replicated set, the Remote Mirror software keeps the sets in synchronization with a memory queue, allowing for a small, finite lag between primary and secondary hosts.
The bitmap volume must be sized according to the following command:
The following command creates a Remote Mirror replicated set consisting of SVM volumes, where the primary is /dev/md/rdsk/d1 and the bitmap is /dev/md/rdsk/d2.
Since this is a synchronous replicated set with a -E (fast enable), there is an assumption that both the primary and secondary volumes are equal. If both the primary and secondary volumes are uninitialized, meaning that there is no file system, database, or application on the volumes, then both volumes are considered the same (uninitialized equals uninitilized). When the primary volume has a file system, database, or application data placed on it, the Remote Mirror software replicates these changes to the secondary, and, by virtue of replication, both volumes will be identical.
Another means by which to accomplish this step is to enable the primary node as shown above, but leave the SNDR set in logging mode and then enable a Point-in-Time Copy using the primary volume as the master volume, thereby creating an instant copy of the set. The primary volume can then be used by the system, applications, or a file system. A backup of the shadow volume needs to be taken; when the backup is complete, the Point-In-Time Copy set on the primary can be disabled. The backup of the shadow volume can be delivered to the site of the remote mirror secondary and restored to disk as specified above. Then a fast enable (-E) must be done on the secondary. When placing the Remote Mirror set in replicating mode, any changes since the Point-in-Time Copy set was made are replicated to the secondary, vastly minimizing the amount of data that needs to be replicated over the network.
The following commands create a Remote Mirror set consisting of VxVM volumes, where the primary master volume is /dev/vx/rdsk/sndr-dg/d21 and the bitmap volume is /dev/vx/rdsk/sndr-dg/d22.
Since this is an asynchronous replicated set with an associated disk queue, the Remote Mirror software keeps the sets in synchronization with a disk queue, allowing for a large, somewhat infinite lag between primary and secondary hosts.
This section discusses how to diagnose performance issues for the Remote Mirror software.
The following Remote Mirror set variables should be considered:
Asynchronous modes give faster local write performance than synchronous. If you find that your performance suddenly changes, then there is likely to have been some event that moved the system into the other mode. Possible events include:
Blocking and non-blocking affects the queue full performance.
When enabled (sndradm -a on set), the Remote Mirror rdcsyncd daemon automates update resynchronization after a network link or machine failure. If a Point-in-Time Copy set was added as an ndr_ii entry (see ndr_ii), the daemon creates a dependent shadow volume of the Remote Mirror secondary, to assure that there is always a valid replica on the secondary site. While a full or update sync is in progress, the Remote Mirror software replicates changed blocks, starting from block 1 to the end of the volume. This replication is block-order, not write-order, so the volume is inconsistent until the synchronization operation completes. Having a ndr_ii Point-in-Copy on the secondary ensures that there is always a consistent, write-ordered volume on the secondary host.
This affects how fast the queue fills up.
The maximum amount of data in the queue.
Affects how fast the queue is sent across the network. More threads may lead to better network utilization.
The following server commands should be considered:
The dstat -m sndr command show basic statistics on the remote replication network and bitmap volumes. Other and more detailed statistics are available with the display option -d.
The iostat command can be used to monitor I/O rates to all Remote Mirror volumes on the local machine in a manner similar to the normal usage of iostat.
The following network commands should be considered:
The rate of remote I/O can be seen from the dsstat output.
Once you have determined that the rdc service is read,y you may want to check the integrity of the link. When configuring the Remote Mirror software, the name associated with the IP address of the interface that the Remote Mirror software will transfer data over will be used. This is true for entries added into the /etc/hosts file as well as when using sndradm commands to enable sets.
A simple test would verify that you can telnet or rlogin through the interfaces the Remote Mirror software will use. You may also want to use the ifconfig command to make sure the interface is plumbed, up, and at the IP address you have configured in the /etc/hosts file. The names and IP addresses of the interfaces being used for the Remote Mirror software on both systems should be in each system's /etc/hosts file.
Network socket queue states can be monitored with netstat. The send and receive socket queues are displayed by the -a option's Swind, Send-Q, Rwind, and Recv-Q columns.
Another command that could be run to check the rdc service is as follows:
# netstat -a|grep rdc *.rdc *.* 0 0 65535 0 LISTEN *.rdc *.* 0 0 65535 0 LISTEN *.rdc *.* 0 0 65535 0 LISTEN |
In the above example the rdc service is available.
The ping command can be used to check that the interfaces can communicate and whether IPV4 or IPV6 addressing is being used.
In the above example, packets are successfully being sent and IPV4 addressing is being used. That is confirmed by looking at the IP address "(10.9.9.2)", which has four values; IPV6 addressing would have six. The ping should be run in both directions (from primary to secondary and secondary to primary) to ensure connectivity in both directions. This is also a good way to verify that both systems are using the same protocol, IPV4 or IPV6.
ping also shows the latency within the network between the two SNDR nodes.
The rpcinfo utility can be used to check a path to the remote Remote Mirror services, either primary or secondary. Two commands are used to check the rdc service:
In the prior example, the rdc service is clearly ready. In the next example, the system was booted with an incorrect entry for "services" in the /etc/nsswitch.conf file and is not ready. In both examples, node1 is the system name. The commands should be run from all systems in the Remote Mirror config.
The snoop utility can be used to see if SNDR is actually sending and receiving date during a copy or update command.
In the example above, the snoop utility is being run from the primary side of the Remote Mirror set. The interface being used is hme0 and the port to report on is the port used by rdc. The interface that is being used by the Remote Mirror software can be determined by relating the name used when enabling with the sndradm command to the IP address in the /etc/hosts file to the interface listed in the ifconfig -a output.
If you are using an ATM interface, a special snoop command called atmsnoop must be used:
Following is a summary of the SunSolve InfoDocs written to address common customer issues for Remote Mirror software. If you believe you are experiencing one of these issues, contact your Sun Service Representative for a swift resolution.
Copyright © 2006, Sun Microsystems, Inc. All Rights Reserved.