C H A P T E R  4

Remote Mirror Software

This chapter discusses Remote Mirror software troubleshooting issues.

The following topics are included:


Common User Errors

This section describes some common errors that you may encounter when using Remote Mirror software.

Safeguarding the VTOC

For information on how to safeguard the VTOC, refer to Safeguarding the Solaris VTOC.

Forgetting to Enable the Remote Mirror Set on the Secondary

If the secondary Remote Mirror set has not been enabled, the application gives the following error:


sndradm: warning: SNDR: Could not open file host:/dev/rdsk/xxxxx on remote node

Misentering the Remote Volume or Host Names

If the remote volume and host names do not match, both instances of SNDR will start, but they will not communicate with each other and replication will be unable to begin. The same message as when the secondary has not been enabled will be seen, but sndradm on the remote node will apparently show the set enabled. It is only on careful inspection that a difference in volume names can be seen to explain the failure.

Accessibility Issues

The most common class of user errors when using the Remote Mirror software is accessibility issues in the specification of the primary host volume and bitmap, secondary host volume and bitmap, or the primary and secondary host names, configured using the sndradm utility. The best means to resolve these types of errors is to use standard Solaris utilities, specifically format(1M), prtvtoc(1M), dd(1M), and telnet(1M).

A typical sndradm enable command using Solaris RAW devices is as follows:


sndradm -e hostA /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 \hostB /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 ip sync

A failure of this command may be due to incorrect device specifications, incorrect partition sizing, failure to access the device from this Solaris node, or Solaris host names. Resolving the issue using the following seven commands should be the first step towards resolving accessibility problems.


# telnet hostA
{login}
# format /dev/rdsk/c0t1d0s0
# format /dev/rdsk/c0t2d0s0
# prtvtoc /dev/rdsk/c0t1d0s0
# prtvtoc /dev/rdsk/c0t2d0s0
# dd if=/dev/rdsk/c0t1d0s0 of=/dev/null count=1
# dd if=/dev/rdsk/c0t2d0s0 of=/dev/null count=1
# dsbitmap -r /dev/rdsk/c0t1d0s0
# telnet hostB
{repeat sequence above}
 

There is no requirement that primary host volume names match the secondary host

volume names, as long as the secondary volume is the same size or greater.

Functionality Issues

The next class of user errors when using the Remote Mirror software are perceived functionality issues. The functionality of the Remove Mirror software is to continuously copy all the data from the primary host's volume to the secondary host's volume, repeatedly, until either replication is stopped or the primary or secondary hosts are no longer available. The first command and the following six commands are essentially equivalent for the setup of a Remote Mirror replication set, except that the second set can take hours or days to complete, as it recopies already copied data.


# sndradm -e hostA /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 \
hostB /dev/rdsk/c0t1d0s0 /dev/rdsk/c0t2d0s0 ip sync

If the replication functionality of the first command listed above does not work as expected, use this set of commands with very small volumes to assure that the replication functionality that is desired works as expected for the configuration of volumes and hostname pertaining to your specific operating environment.


#!/bin/csh
# repeat:
# rsh hostA dd if=/dev/rdsk/c0t1d0s0 of=/tmp/hostA.tmp
# rsh hostA rcp /tmp/hostA.tmp hostB:/tmp/hostA.tmp
# rsh hostB dd of=/dev/rdsk/c0t1d0s0 if=/tmp/hostA.tmp
# goto repeat

Data Integrity Issues

When a Remote Mirror set is first enabled, the secondary volume may take hours or days to complete the initial synchronization, which is highly dependent on the volume size, network bandwidth and latency, and system resources on both the primary and secondary nodes. Review the Sun StorageTek Availability Suite 4.0 Remote Mirror Software Administration Guide for various methods which incorporate the use of sndradm -E for fast enable operations.

Once the initial full synchronization has completed, the Remote Mirror secondary volume is kept in write-order consistency, an operation which may lag the Remote Mirror primary volume. If at any time the replication process stops, logging mode is enabled, the network link does down, or there is a system failure, a replicated I/O operation may have been in progress. This state may result in a Remote Mirror secondary volume data set that appears inconsistent, meaning that utilities like fsck(1M), database recovery tools, or similar software may have to make indeterminate decisions about the validity of an incomplete I/O operation. The means by which the Remote Mirror software keeps a primary and secondary replicated set in write-order consistency results in the same I/O consistency issues as a Solaris node "panicking" while I/O is in progress.

If you are manually placing a Remote Mirror primary volume in logging mode to use the secondary volume, it is highly recommended that the primary volume be quiesced and all cached data blocks flushed to disk, so that the Remote Mirror software finishes replicating a consistent volume to the secondary host.


Configuration

This section discusses configuration issues for the Remote Mirror software.

Set Status

Set status can be checked with the sndradm -P command. The percentage of the primary that needs to be transmitted to the secondary to complete a sync operation can be seen with the dsstat -m sndr command.

Files

The file /var/adm/ds.log contains a record of Availability Suite activity, including which remote replication sets have been enabled, resumed, and stopped by the sndradm and sndrboot utilities.

Volume Configuration

Raw Partition

The following command creates a Remote Mirror replicated set consisting of raw partitions, where the primary is /dev/rdsk/c7t0d0s6 and the bitmap is /dev/rdsk/c7t1d0s6. Note the exact same command must be issued on both the primary and secondary host to complete a single Remote Mirror replicated set.


# sndradm -e hostA /dev/rdsk/c7t0d0s6 /dev/rdsk/c7t1d0s6 hostB \/dev/rdsk/c7t0d0s6 /dev/rdsk/c7t1d0s6 ip async

Since this is an asynchronous replicated set, the Remote Mirror software keeps the sets in synchronization with a memory queue, allowing for a small, finite lag between primary and secondary hosts.

The bitmap volume must be sized according to the following command:


# dsbitmap -r /dev/rdsk/c7t0d0s6

Solaris Volume Manager

The following command creates a Remote Mirror replicated set consisting of SVM volumes, where the primary is /dev/md/rdsk/d1 and the bitmap is /dev/md/rdsk/d2.


# sndradm -E hostA /dev/md/rdsk/d1 /dev/md/rdsk/d2 hostB \/dev/md/rdsk/d1 /dev/md/rdsk/d2 ip async

Since this is a synchronous replicated set with a -E (fast enable), there is an assumption that both the primary and secondary volumes are equal. If both the primary and secondary volumes are uninitialized, meaning that there is no file system, database, or application on the volumes, then both volumes are considered the same (uninitialized equals uninitilized). When the primary volume has a file system, database, or application data placed on it, the Remote Mirror software replicates these changes to the secondary, and, by virtue of replication, both volumes will be identical.

Another means by which to accomplish this step is to enable the primary node as shown above, but leave the SNDR set in logging mode and then enable a Point-in-Time Copy using the primary volume as the master volume, thereby creating an instant copy of the set. The primary volume can then be used by the system, applications, or a file system. A backup of the shadow volume needs to be taken; when the backup is complete, the Point-In-Time Copy set on the primary can be disabled. The backup of the shadow volume can be delivered to the site of the remote mirror secondary and restored to disk as specified above. Then a fast enable (-E) must be done on the secondary. When placing the Remote Mirror set in replicating mode, any changes since the Point-in-Time Copy set was made are replicated to the secondary, vastly minimizing the amount of data that needs to be replicated over the network.

VERITAS Volume Manager

The following commands create a Remote Mirror set consisting of VxVM volumes, where the primary master volume is /dev/vx/rdsk/sndr-dg/d21 and the bitmap volume is /dev/vx/rdsk/sndr-dg/d22.


# sndradm -e hostA /dev/vx/rdsk/sndr-dg/d21 \/dev/vx/rdsk/sndr-dg/d22 hostB /dev/vx/rdsk/sndr-dg/d23 \/dev/vx/rdsk/sndr-dg/d24 ip async# sndradm -q a /dev/vx/rdsk/sndr-dg/d30 \
hostB:/dev/vx/rdsk/sndr-dg/d30

Since this is an asynchronous replicated set with an associated disk queue, the Remote Mirror software keeps the sets in synchronization with a disk queue, allowing for a large, somewhat infinite lag between primary and secondary hosts.


Performance Diagnosis

This section discusses how to diagnose performance issues for the Remote Mirror software.

Remote Mirror Set Variables

The following Remote Mirror set variables should be considered:

sync and async

Asynchronous modes give faster local write performance than synchronous. If you find that your performance suddenly changes, then there is likely to have been some event that moved the system into the other mode. Possible events include:

queue modes

Blocking and non-blocking affects the queue full performance.

autosync

When enabled (sndradm -a on set), the Remote Mirror rdcsyncd daemon automates update resynchronization after a network link or machine failure. If a Point-in-Time Copy set was added as an ndr_ii entry (see ndr_ii), the daemon creates a dependent shadow volume of the Remote Mirror secondary, to assure that there is always a valid replica on the secondary site. While a full or update sync is in progress, the Remote Mirror software replicates changed blocks, starting from block 1 to the end of the volume. This replication is block-order, not write-order, so the volume is inconsistent until the synchronization operation completes. Having a ndr_ii Point-in-Copy on the secondary ensures that there is always a consistent, write-ordered volume on the secondary host.

max q writes

This affects how fast the queue fills up.

max q fbas

The maximum amount of data in the queue.

async Threads

Affects how fast the queue is sent across the network. More threads may lead to better network utilization.

Server Commands

The following server commands should be considered:

dsstat

The dstat -m sndr command show basic statistics on the remote replication network and bitmap volumes. Other and more detailed statistics are available with the display option -d.

iostat

The iostat command can be used to monitor I/O rates to all Remote Mirror volumes on the local machine in a manner similar to the normal usage of iostat.

Network Commands

The following network commands should be considered:

dsstat

The rate of remote I/O can be seen from the dsstat output.

ifconfig

Once you have determined that the rdc service is read,y you may want to check the integrity of the link. When configuring the Remote Mirror software, the name associated with the IP address of the interface that the Remote Mirror software will transfer data over will be used. This is true for entries added into the /etc/hosts file as well as when using sndradm commands to enable sets.

A simple test would verify that you can telnet or rlogin through the interfaces the Remote Mirror software will use. You may also want to use the ifconfig command to make sure the interface is plumbed, up, and at the IP address you have configured in the /etc/hosts file. The names and IP addresses of the interfaces being used for the Remote Mirror software on both systems should be in each system's /etc/hosts file.


# ifconfig -a
ba0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 9180 index 1
        inet 10.9.9.1 netmask ffffff00 broadcast 10.9.9.255
        ether 8:0:20:af:8e:d0 
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 2
        inet 127.0.0.1 netmask ff000000 
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.8.11.124 netmask ffffff00 broadcast 10.8.11.255
        ether 8:0:20:8d:f7:2c 
lo0: flags=2000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6> mtu 8252 index 2
        inet6 ::1/128 
hme0: flags=2000841<UP,RUNNING,MULTICAST,IPv6> mtu 1500 
index 3
        ether 8:0:20:8d:f7:2c 
        inet6 fe80::a00:20ff:fe8d:f72c/10 

netstat

Network socket queue states can be monitored with netstat. The send and receive socket queues are displayed by the -a option's Swind, Send-Q, Rwind, and Recv-Q columns.

Another command that could be run to check the rdc service is as follows:


# netstat -a|grep rdc
*.rdc                *.*                0      0 65535      0 LISTEN
*.rdc                *.*                0      0 65535      0 LISTEN
*.rdc                *.*                0      0 65535      0 LISTEN

In the above example the rdc service is available.

ping

The ping command can be used to check that the interfaces can communicate and whether IPV4 or IPV6 addressing is being used.


# ping -s second.atm
PING second.atm: 56 data bytes
64 bytes from second.atm (10.9.9.2): icmp_seq=0. time=1. ms
64 bytes from second.atm (10.9.9.2): icmp_seq=1. time=0. ms
64 bytes from second.atm (10.9.9.2): icmp_seq=2. time=0. ms
64 bytes from second.atm (10.9.9.2): icmp_seq=3. time=0. ms

In the above example, packets are successfully being sent and IPV4 addressing is being used. That is confirmed by looking at the IP address "(10.9.9.2)", which has four values; IPV6 addressing would have six. The ping should be run in both directions (from primary to secondary and secondary to primary) to ensure connectivity in both directions. This is also a good way to verify that both systems are using the same protocol, IPV4 or IPV6.

ping also shows the latency within the network between the two SNDR nodes.

rpcinfo

The rpcinfo utility can be used to check a path to the remote Remote Mirror services, either primary or secondary. Two commands are used to check the rdc service:


# rpcinfo -T tcp node1 100143 4
program 100143 version 7 ready and waiting

In the prior example, the rdc service is clearly ready. In the next example, the system was booted with an incorrect entry for "services" in the /etc/nsswitch.conf file and is not ready. In both examples, node1 is the system name. The commands should be run from all systems in the Remote Mirror config.


# rpcinfo -T tcp node1 100143 7
 rpcinfo: RPC: Program not registered

snoop

The snoop utility can be used to see if SNDR is actually sending and receiving date during a copy or update command.


# snoop -d hme0 port rdc
Using device /dev/hme (promiscuous mode)
 node2 -> node1 RPC C XID=3565514130 PROG=100143 (?) VERS=4 PROC=8
 node1 -> node2 RPC R (#1) XID=3565514130 Success
 node2 -> node1 TCP D=121 S=1018     Ack=1980057565 Seq=2524537885 
Len=0 Win=33304 Options=<nop,nop,tstamp 1057486 843038>
 node2 -> node1 RPC C XID=3565514131 PROG=100143 (?) VERS=4 PROC=8
 node1 -> node2 RPC R (#4) XID=3565514131 Success
 node2 -> node1 TCP D=121 S=1018     Ack=1980057597 Seq=2524538025 
Len=0 Win=33304 Options=<nop,nop,tstamp 1057586 843138>
 node2 -> node1 RPC C XID=3565514133 PROG=100143 (?) VERS=4 PROC=8
 node1 -> node2 RPC R (#7) XID=3565514133 Success
 node2 -> node1 TCP D=121 S=1018     Ack=1980057629 Seq=2524538165 
Len=0 Win=33304 Options=<nop,nop,tstamp 1057686 843238>
 node2 -> node1 RPC C XID=3565514134 PROG=100143 (?) VERS=4 PROC=8

In the example above, the snoop utility is being run from the primary side of the Remote Mirror set. The interface being used is hme0 and the port to report on is the port used by rdc. The interface that is being used by the Remote Mirror software can be determined by relating the name used when enabling with the sndradm command to the IP address in the /etc/hosts file to the interface listed in the ifconfig -a output.

If you are using an ATM interface, a special snoop command called atmsnoop must be used:


# /etc/opt/SUNWconn/atm/bin/atmsnoop -d ba0 port rdc
device ba0
Using device /dev/ba (promiscuous mode)
TRANSMIT : VC=32
TCP D=121 S=1011 Syn Seq=2333980324 Len=0 Win=36560
________________________________________________________________
RECEIVE : VC=32
TCP D=1011 S=121 Syn Ack=2333980325 Seq=2878301021 Len=0 Win=36512
________________________________________________________________
TRANSMIT : VC=32
TCP D=121 S=1011     Ack=2878301022 Seq=2333980325 Len=0 Win=41076
________________________________________________________________
TRANSMIT : VC=32
RPC C XID=1930565346 PROG=100143 (?) VERS=4 PROC=11
________________________________________________________________
RECEIVE : VC=32
TCP D=1011 S=121     Ack=2333980449 Seq=2878301022 Len=0 Win=36450
________________________________________________________________
RECEIVE : VC=32
RPC R (#4) XID=1930565346 Success
________________________________________________________________
TRANSMIT : VC=32
TCP D=121 S=1011     Ack=2878301054 Seq=2333980449 Len=0 Win=41076


InfoDoc Summary

Following is a summary of the SunSolve InfoDocs written to address common customer issues for Remote Mirror software. If you believe you are experiencing one of these issues, contact your Sun Service Representative for a swift resolution.


TABLE 4-1 InfoDocs Addressing Remote Mirror Software Issues

InfoDoc ID

Issue

45485

SNDR wait command (sndradm -w or rdcadm -w) may return prematurely when run in a script

70015

Unable to grow a ufs filesystem under SNDR

71559

Cannot remove SVM, Veritas volumes, or DR LUNs under Availability Suite Software

73827

"SNDR: Recovery bitmaps not allocated"

77167

Booting either host causes entire sync in Remote Mirror or Point-in-Time Copy

80100

Warning Message: "bitmap reference count maxed out"

80732

Missing Remote Mirror Sets After a Host Boot