Sun Storage 7000 appliances support snapshot-based replication of projects and shares from a source appliance to any number of target appliances manually, on a schedule, or continuously. The replication includes both data and metadata. Remote replication (or just "replication") is a general-purpose feature optimized for the following use cases:
Disaster recovery. Replication can be used to mirror an appliance for disaster recovery. In the event of a disaster that impacts service of the primary appliance (or even an entire datacenter), administrators activate service at the disaster recovery site, which takes over using the most recently replicated data. When the primary site has been restored, data changed while the disaster recovery site was in service can be migrated back to the primary site and normal service restored. Such scenarios are fully testable before such a disaster occurs.
Data distribution. Replication can be used to distribute data (such as virtual machine images or media) to remote systems across the world in situations where clients of the target appliance wouldn't ordinarily be able to reach the source appliance directly, or such a setup would have prohibitively high latency. One example uses this scheme for local caching to improve latency of read-only data (like documents).
Disk-to-disk backup. Replication can be used as a backup solution for environments in which tape backups are not feasible. Tape backup might not be feasible, for example, because the available bandwidth is insufficient or because the latency for recovery is too high.
Data migration. Replication can be used to migrate data and configuration between 7000 series appliances when upgrading hardware or rebalancing storage. Shadow migration can also be used for this purpose.
The remote replication feature has several important properties:
Snapshot-based. The replication subsystem takes a snapshot as part of each update operation and sends either the entire project contents up to the snapshot in the case of a full update. In the case of an incremental update, only the changes since the last replication snapshot for the same action are sent.
Block-level. Each update operation traverses the filesystem at the block level and sends the appropriate filesystem data and metadata to the target.
Asynchronous. Because replication takes snapshots and then sends them, data is necessarily committed to stable storage before replication even begins sending it. Continuous replication effectively sends continuous streams of filesystem changes, but it's still asynchronous with respect to NAS and SAN clients.
Includes metadata. The underlying replication stream serializes both user data and ZFS metadata, including most properties configured on the Shares screen. These properties can be modified on the target after the first replication update completes, though not all take effect until the replication connection is severed. For example, to allow sharing over NFS to a different set of hosts than on the source. See Manging Replication Targets for details.
Secure. The replication control protocol used among Sun Storage 7000 appliances is secured with SSL. Data can optionally be protected with SSL as well. Appliances can only replicate to/from other appliances after an initial manual authentication process, see Creating and Editing Targets below.
replication peer (or just peer, in this context): a Sun Storage 7000 appliance that has been configured as a replication source or target.
replication source (or just source): an appliance peer containing data to be replicated to another appliance peer (the target). Individual appliances can act as both a source and a target, but are only one of these in the context of a particular replication action.
replication target (or just target): an appliance peer that will receive and store data replicated from another appliance peer (the source). This term also refers to a configuration object on the appliance that enables it to replicate to another appliance.
replication group (or just group): the set of datasets (exactly one project and some number of shares) which are replicated as a unit. See Project-level vs. Share-level below.
replication action (or just action): a configuration object on a source appliance specifying a project or share, a target appliance, and policy options (including how often to send updates, whether to encrypt data on the wire, etc.).
package: the target-side analog of an action; the configuration object on the target appliance that manages the data replicated as part of a particular action from a particular source. Each action on a source appliance is associated with exactly one package on a target appliance and vice versa. Loss of either object will require creating a new action/package pair (and a full replication update).
full sync (or full update): a replication operation that sends the entire contents of a project and some of its shares.
incremental update: a replication operation that sends only the differences in a project and its shares since the previous update (whether that one was full or incremental).
Before a source appliance can replicate to a target, the two systems must set up a replication peer connection that enables the appliances to identify each other securely for future communications. Administrators setup this connection by creating a new replication target on the Configuration > Services > Remote Replication screen on the source appliance. To create a new target, administrators specify three fields:
a name (used only to identify the target in the source appliance's BUI and CLI)
a network address or hostname (to contact the target appliance)
the target appliance's root password (to authorize the administrator to setup the connection on the target appliance)
The appliances then exchange keys used to securely identify each other in subsequent communications. These keys are stored persistently as part of the appliance's configuration and persist across reboots and upgrades. They will be lost if the appliance is factory reset or reinstalled. The root password is never stored persistently, so changing the root password on either appliance does not require any changes to the replication configuration. The password is never transmitted in the clear either because this initial identity exchange (like all replication control operations) is protected with SSL.
By default, the replication target connection is not bidirectional. If an administrator configures replication from a source A to a target B, B cannot automatically use A as a target. However, the system supports reversing the direction of replication, which automatically creates a target for A on B (if it does not already exist) so that B can replicate back to A.
To configure replication targets, see Configuring Replication below.
Targets represent a connection between appliances that enables them to communicate securely for the purpose of replication, but targets do not specify what will be replicated, how often, or with what options. For this, administrators must define replication actions on the source appliance. Actions are the primary administrative control point for replication, each one specifying:
a replication group (a project and some number of shares)
a target appliance
a storage pool on the target appliance (used only during the initial setup)
a frequency (which may be manual, scheduled, or continuous)
additional options such as whether to encrypt the data stream on the wire
The group is specified implicitly by the project or share on which the action is configured (see Project-level versus Share-level Replication below). The target appliance and storage pool cannot be changed after the action is created, but the other options can be modified at any time. Generally, if a replication update is in progress when an option is changed, then the new value only takes effect when the next update begins.
Actions are the primary unit of replication configuration on the appliance. Each action corresponds to a package on the target appliance that contains an exact copy of the source projects and shares on which the action is configured as of the start time of the last replication update. Administrators configure the frequency and other options for replication updates by modifying properties of the corresponding action. Creating the action on the source appliance creates the package on the target appliance in the specified storage pool, so the source must be able to contact the target when the action is initially created.
The first update for each replication action sends a full sync (or full update): the entire contents of the action's project and shares are sent to the target appliance. Once this initial sync completes, subsequent replication updates are incremental: only the changes since the previous update are sent. The action (on the source) and package (on the target) keep track of which changes have been replicated to the target through named replication snapshots (see below). Generally, as long as at least one full sync has been sent for an action and the action/package connection has not been corrupted due to a software failure or administrative action, replication updates will be incremental.
The action and package are bound to each other. If the package is somehow corrupted or destroyed, the action will not be able to send replication updates, even if the target still has the data and snapshots associated with the action. Similarly, if the action is destroyed, the package will be unable to receive new replication updates (even if the source still has the same data and snapshots). The BUI and CLI warn administrators attempting to perform operations that would destroy the action-package connection. If an error or explicit administrative operation breaks the action-package connection such that an incremental update is no longer possible, administrators must sever or destroy the package and action and create a new action on the source.
One special case of this needs explicit mention. The appliance avoids destroying data on the target unless explicitly requested by the administrator. As a result, if the initial replication update for an action fails for any reason after having replicated some data (thus leaving incomplete data inside the package), subsequent replication updates using the same action will fail because the appliance will not overwrite the already-received data. To resolve this, administrators should destroy the existing action and package and create a new action and package and start replication again.
In software releases prior to 2010.Q1, action and replica configuration (like target configuration) was stored on the controller rather than as part of the project and share configuration in the storage pool. As a result, factory reset caused all such configuration to be destroyed. In 2010.Q1 and later releases, the action and package configuration is stored in the storage pool with the corresponding projects and shares and so will be available even after factory reset. However, target information will still be lost, and actions with missing targets currently cannot be configured to point to a new target.
When the action is initially configured, the administrator is given a choice of which storage pool on the target should contain the replicated data. The storage pool containing an action cannot be changed once the action has been created. Creating the action creates the empty package on the target in the specified storage pool, and after this operation the source has no knowledge of the storage configuration on the target. It does not keep track of which pool the action is being replicated to, nor is it updated with storage configuration changes on the target.
When the target is a clustered system, the chosen storage pool must be one owned by same head which owns the IP address used by the source for replication because only those pools are always guaranteed to be accessible when the source contacts the target using that IP address. This is exactly analogous to the configuration of NAS clients (NFS and SMB), where the IP address and path requested in a mount operation must obey the same constraint. When performing operations that change the ownership of storage pools and IP addresses in a cluster, administrators must consider the impact to sources replicating to the cluster. There is currently no way to move packages between storage pools or change the IP address associated with an action.
The appliance allows administrators to configure remote replication on both the project or share level. Like other properties configurable on the Shares screen, each share can either inherit or override the configuration of its parent project. Inheriting the configuration means not only that the share is replicated on the same schedule to the same target with the same options as its parent project is, but also that the share will be replicated in the same stream using the same project-level snapshots as other shares inheriting the project's configuration. This may be important for applications which require consistency between data stored on multiple shares. Overriding the configuration means that the share will not be replicated with any project-level actions, though it may be replicated with its own share-level actions that will include the project. It is not possible to override part of the project's replication configuration and inherit the rest.
More precisely, the replication configuration of a project and its shares define some number of replication groups, each of which is replicated with a single stream using snapshots taken simultaneously. All groups contain the project itself (which essentially just includes its properties). One project-level group includes all shares inheriting the replication configuration of the parent project. Any shares which override the project's configuration form a new group consisting of only the project and share themselves.
For example, suppose we have the following:
a project home and shares bill, cindi, and dave.
home has replication configured with some number of actions
home/bill and home/cindi inherit the project's replication configuration
home/dave overrides the project's replication configuration, using its own configuration with some number of actions
This configuration defines the following replication groups, each of which is replicated as a single stream per action using snapshots taken simultaneously on the project and shares:
one project-level group including home, home/bill, and home/cindi.
one share-level group including home and home/dave.
It is strongly recommended that project- and share-level replication be avoided within the same project because it can lead to surprising results (particularly when reversing the direction of replication). See the documentation for Managing Replication Packages for more details.
Be sure to read and understand the above sections on replication targets, actions, and packages before configuring replication.
In the CLI, navigate to the targets node to set or unset the target hostname, root_password, and label .
knife:> configuration services replication targets
From this context, administrators can:
Add new targets
View the actions configured with the existing target
Edit the unique identifier (label) for a target
Destroy a target, if no actions are using it
Targets should not be destroyed while actions are using it. Such actions will be permanently broken. The system makes a best effort to enforce this but cannot guarantee that no actions exist in exported storage pools that are using a given target.
After at least one replication target has been configured, administrators can configure actions on a local project or share by navigating to it in the BUI and clicking the Replication tab or navigating to it in the CLI and selecting the "replication" node. These interfaces show the status of existing actions configured on the project or share and allow administrators to create new actions:
Replication actions have the following properties, which are presented slightly differently in the BUI and CLI:
Replication actions can be configured to send updates manually, on a schedule, or continuously. The replication update process itself is the same in all cases. This property only controls the interval.
Because continuous replication actions send updates as frequently as possible, they essentially result in sending a constant stream of all filesystem changes to the target system. For filesystems with a lot of churn (many files created and destroyed in short intervals), this can result in replicating much more data than actually necessary. However, as long as replication can keep up with data changes, this results in the minimum data lost in the event of a data-loss disaster on the source system.
Note that continuous replication is still asynchronous. Sun Storage appliances do not currently support synchronous replication, which does not consider data committed to stable storage until it's committed to stable storage on both the primary and secondary storage systems.
When the "Include Snapshots" property is true, replication updates include the non-replication snapshots created after the previous replication update (or since the share's creation, in the case of the first full update). This includes automatic snapshots and administrator-created snapshots.
This property can be disabled to skip these snapshots and send only the changes between replication snapshots with each update.
For targets that have been configured with scheduled or manual replication, administrators can choose to immediately send a replication update by clicking the button in the BUI or using the sendupdate command in the CLI. This is not available (or will not work) if an update is actively being sent. Make sure there is enough disk space on the target to replicate the entire project before sending an update.
If an update is currently active, the BUI will display a barber-pole progress bar and the CLI will show a state of sending. To cancel the update, click the button or use the cancelupdate command. It may take several seconds before the cancellation completes.
Packages are containers for replicated projects and shares. Each replication action on a source appliance corresponds to one package on the target appliance as described above. Both the BUI and CLI enable administrators to browse replicated projects, shares, snapshots, and properties much like local projects and shares. However, because replicated shares must exactly match their counterparts on the source appliance, many management operations are not allowed inside replication packages, including creating, renaming, and destroying projects and shares, creating and renaming snapshots, and modifying most properties of projects and shares. Snapshots other than those used as the basis for incremental replication can be destroyed in replication packages. This practice is not recommended but can be used when additional free space is necessary.
In 2009.Q3 and earlier software versions, properties could not be changed on replicated shares. The 2010.Q1 release (with associated deferred upgrades) adds limited support for modifying properties of replicated shares to implement differing policies on the source and target appliances. Such property modifications persist across replication updates. Only the following properties of replicated projects and shares may be modified:
Reservation, compression, copies, deduplication, and caching. These properties can be changed on the replication target to effect different cost, flexibility, performance, or reliability policies on the target appliance from the source.
Mountpoint and sharing properties (e.g., sharenfs, SMB resource name, etc.). These properties control how shares are exported to NAS clients and can be changed to effect different security or protection policies on the target appliance from the source.
Automatic snapshot policies. Automatic snapshot policies can be changed on the target system but these changes have no effect until the package is severed (see below). Automatic snapshots are not taken or destroyed on replicated projects and shares.
The BUI and CLI don't allow administrators to change immutable properties. For shares, a different icon is used to indicate that the property's inheritance cannot be changed:
Note that the deferred updates provided with the 2010.Q1 release must be applied on replication targets in order to modify properties on such targets. The system will not allow administrators to modify properties inside replication packages on systems which have not applied the 2010.Q1 deferred updates.
Note that the current release does not support configuration of "chained" replication (that is, replicating replicated shares to another appliance).
Replication packages are displayed in the BUI as projects under the "Replica" filter:
Selecting a replication package for editing brings the administrator to the Shares view for the package's project. From here, administrators can manage replicated shares much like local shares with the exceptions described above. Package properties (including status) can be modified under the Replication tab (see below):
The status icon on the left changes when replication has failed:
Packages are only displayed in the BUI after the first replication update has begun. They may not appear in the list until some time after the first update has completed.
Replication packages are organized in the CLI by source under shares replication sources. Administrators first select a source, then a package. Package-level operations can be performed on this node (see below), or the project can be selected to manage project properties and shares just like local projects and shares with the exceptions described above. For example:
loader:> shares replication sources loader:shares replication sources> show Sources: source-000 ayu PROJECT STATE LAST UPDATE package-000 oldproj idle unknown package-001 aproj1 receiving Sun Feb 21 2010 22:04:35 GMT+0000 (UTC) loader:shares replication sources> select source-000 loader:shares replication source-000> select package-001 loader:shares replication source-000 package-001> show Properties: enabled = true state = receiving state_description = Receiving update last_sync = Sun Feb 21 2010 22:04:40 GMT+0000 (UTC) last_try = Sun Feb 21 2010 22:04:40 GMT+0000 (UTC) Projects: aproj1 loader:shares replication source-000 package-001> select aproj1 loader:shares replication source-000 package-001 aproj1> get mountpoint mountpoint = /export loader:shares replication source-000 package-001 aproj1> get sharenfs sharenfs = on
To cancel in-progress replication updates on the target using the BUI, navigate to the replication package (see above), then click the Replication tab. If an update is in progress, you will see a barber pole progress bar with a cancel button () next to it as shown here:
Click this button to cancel the update.
To cancel in-progress replication updates on the target using the CLI, navigate to the replication package (see above) and use the cancelupdate command.
It is not possible to initiate updates from the target. Administrators must login to the source system to initiate a manual update.
Replication updates for a package can be disabled entirely, cancelling any ongoing update and causing new updates from the source appliance to fail.
To toggle whether a package is disabled from the BUI, navigate to the package (see above), then click the Replication tab, and then click the icon. The status icon on the left should change to indicate the package's status (enabled, disabled, or failed). The package remains disabled until explicitly enabled by an administrator using the same button or the CLI.
To toggle whether a package is disabled from the CLI, navigate to the package (see above), modify the enabled property, and commit your changes.
A clone of a replicated package is a local, mutable project that can be managed like any other project on the system. The clone's shares are clones of the replicated shares at the most recently received snapshot. These clones share storage with their origin snapshots in the same way as clones of share snapshots do (see Cloning a Snapshot). This mechanism can be used to failover in the case of a catastrophic problem at the replication source, or simply to provide a local version of the data that can be modified.
Use the button in the BUI or the clone CLI command (in the package's context) to create a package clone based on the most recently received replication snapshot. Both the CLI and BUI interface require the administrator to specify a name for the new clone project and allow the administrator to override the mountpoint of the project or its shares to ensure that they don't conflict with those of other shares on the system.
In 2009.Q3 and earlier, cloning a replicated project was the only way to access its data and thus the only way to implement disaster-recovery failover. In 2010.Q1 and later, individual filesystems can be exported read-only without creating a clone (see below). Additionally, replication packages can be directly converted into writable local projects as part of a failover operation. As a result, cloning a package is no longer necessary or recommended, as these alternatives provide similar functionality with simpler operations and without having to manage clones and their dependencies.
In particular, while a clone exists, its origin snapshot cannot be destroyed. When destroying the snapshot (possibly as a result of destroying the share, project, or replication package of which the snapshot is a member), the system warns administrators of any dependent clones which will be destroyed by the operation. Note that snapshots can also be destroyed on the source at any time and such snapshots are destroyed on the target as part of the subsequent replication update. If such a snapshot has clones, the snapshot will instead be renamed with a unique name (typically recv-XXX).
Administrators can also clone individual replicated share snapshots using the normal BUI and CLI interfaces.
Replicated filesystems can be exported read-only to NAS clients. This can be used to verify the replicated data or to perform backups or other intensive operations on the replicated data (offloading such work from the source appliance).
The filesystem's contents always matches the most recently received replication snapshot for that filesystem. This may be newer than the most recently received snapshot for the entire package, and it may not match the most recent snapshot for other shares in the same package. See "Snapshots and Data Consistency" below for details.
Replication updates are applied atomically at the filesystem level. Clients looking at replicated files will see replication updates as an instantaneous change in the underlying filesystem. Clients working with files deleted in the most recent update will see errors. Clients working with files changed in the most recent update will immediately see the updated contents.
Replicated filesystems are not exported by default. They are exported by modifying the "exported" property of the project or share using the BUI or CLI:
This property is inherited like other share properties. This property is not shown for local projects and shares because they are always exported. Additionally, severing replication (which converts the package into a local project) causes the package's shares to become exported.
Replicated LUNs currently cannot be exported. They must be first cloned or the replication package severed in order to export their contents.
A replication package can be converted into a local, writable project that behaves just like other local projects (i.e. without the management restrictions applied to replication packages) by severing the replication connection. After this operation, replication updates can no longer be received into this package, so subsequent replication updates of the same project from the source will need to send a full update with a new action (into a new package). Subsequent replication updates using the same action will fail because the corresponding package no longer exists on the target.
This option is primarily useful when using replication to migrate data between appliances or in other scenarios that don't involve replicating the received data back to the source as part of a typical two-system disaster recovery plan.
Replication can be severed from the BUI by navigating to the replication package (see above), clicking the Replication tab, and clicking the button. The resulting dialog allows the administrator to specify the name of the new local project.
Replication can be severed from the CLI by navigating to the replication package (see above), and using the sever command. This command takes an optional argument specifying the name of the new local project. If no argument is specified, the original name is used.
Because all local shares are exported, all shares in a package are exported when the package is severed, whether or not they were previously exported (see above). If there are mountpoint conflicts between replicated filesystems and other filesystems on the system, the sever operation will fail. These conflicts must be resolved before severing by reconfiguring the mountpoints of the relevant shares.
The direction of the replication can be reversed to support typical two-system disaster recovery plans. This operation is similar to the sever operation described above, but additionally configures a replication action on the new local project for incremental replication back to the source system. No changes are made on the source system when this operation is completed, but the first update attempt using this action will convert the original project on the source system into a replication package and rollback any changes made since the last successful replication update from that system. This feature does not automatically redirect production workloads, failover IP addresses, or perform other activities related to the disaster-recovery failover besides modifying the read-write status of the primary and secondary data copies.
As part of the conversion of the original source project into a replication package on the original source system (now acting as the target), the shares that were replicated as part of the action/package currently being reversed are moved into a new replication package and unexported. The original project remains in the local collection but may end up empty if the action/package included all of its shares. When share-level replication is reversed, any other shares in the original project remain unchanged.
As mentioned above, this feature is typically used to implement a two-system disaster recovery configuration in which a primary system serves production data and replicates it to a secondary or DR system (often in another datacenter) standing by to take over the production traffic in the event of a disaster at the primary site. In the event of a disaster at the primary site, the secondary site's copy must be made "primary" by making it writable and redirecting production traffic to the secondary site. When the primary site is repaired, the changes accumulated at the secondary site can be replicated back to the primary site and that site can resume servicing the production workload.
A typical sequence of events under such a plan is as follows:
The primary system is serving the production workload and replicating to the secondary system.
A disaster occurs, possibly representing a total system failure at the primary site. Administrators reverse the direction of replication on the secondary site, exporting the replicated shares under a new project configured for replication back to the primary site for when primary service is restored. In the meantime, the production workload is redirected to the secondary site.
When the primary site is brought back online, an administrator initiates a replication update from the secondary site to the primary site. This converts the primary's copy into a replication package, rolling back any changes made since the last successful update to the target (before the failure). When the primary site's copy is up-to-date, the direction of replication is reversed again, making the copy at the primary site writable. Production traffic is redirected back to the primary site. Replication is resumed from the primary to the secondary, restoring the initial relationship between the primary and secondary copies.
When reversing the direction of replication for a package, it is strongly recommended that administrators first stop replication of that project from the source. If a replication update is in progress when an administrator reverses the direction of replication for a project, administrators cannot know which consistent replication snapshot was used to create the resulting project on the former target appliance (now source appliance).
Replication can be reversed from the BUI by navigating to the replication package (see above), clicking the Replication tab, and clicking the button. The resulting dialog allows the administrator to specify the name of the new local project.
Replication can be severed from the CLI by navigating to the replication package (see above), and using the reverse command. This command takes an optional argument specifying the name of the new local project. If no argument is specified, the original name is used.
Because all local shares are exported, all shares in a package are exported when the package is reversed, whether or not they were previously exported (see above). If there are mountpoint conflicts between replicated filesystems and other filesystems on the system, the reverse operation will fail. These conflicts must be resolved before severing by reconfiguring the mountpoints of the relevant shares. Because this operation is typically part of the critical path of restoring production service, it is strongly recommended to resolve these mountpoint conflicts when the systems are first setup rather than at the time of DR failover.
The project and shares within a package cannot be destroyed without destroying the entire package. The entire package can be destroyed from the BUI by destroying the corresponding project. A package can be destroyed from the CLI using the destroy command at the shares replication sources node.
When a package is destroyed, subsequent replication updates from the corresponding action will fail. To resume replication, the action will need to be recreated on the source to create a new package on the target into which to receive a new copy of the data.
Below is an example of cloning a received replication project, overriding both the project's and one share's mountpoint:
perch:> shares perch:shares> replication perch:shares replication> sources perch:shares replication sources> select source-000 perch:shares replication source-000> select package-000 perch:shares replication source-000 package-000> clone perch:shares replication source-000 package-000 clone> set target_project=my_clone target_project = my_clone perch:shares replication source-000 package-000 clone> list CLONE PARAMETERS target_project = my_clone original_mountpoint = /export override_mountpoint = false mountpoint = SHARE MOUNTPOINT bob (inherited) myfs1 (inherited) perch:shares replication source-000 package-000 clone> set override_mountpoint=true override_mountpoint = true perch:shares replication source-000 package-000 clone> set mountpoint=/export/my_clone mountpoint = /export/my_clone perch:shares replication source-000 package-000 clone bob> select bob perch:shares replication source-000 package-000 clone bob> set override_mountpoint=true override_mountpoint = true perch:shares replication source-000 package-000 clone bob> set mountpoint=/export/bob mountpoint = /export/bob perch:shares replication source-000 package-000 clone bob> done perch:shares replication source-000 package-000 clone> commit CLONE PARAMETERS target_project = my_clone original_mountpoint = /export override_mountpoint = true mountpoint = /export/my_clone SHARE MOUNTPOINT bob /export/bob (overridden) myfs1 (inherited) Are you sure you want to clone this project? There are no conflicts. perch:shares replication source-000 package-000 clone>
In addition to the Remote Replication filter under the Services scope that allows administrators to stop, start, and restart the replication service, the replication subsystem provides two authorizations under the "Projects and Shares" scope:
Note that the rrsource authorization is required to configure replication targets on an appliance, even though this is configured under the Remote Replication service screen.
For help with authorizations, see the Authorizations documentation.
The system posts alerts when any of the following events occur:
Manual or scheduled replication update starts or finishes successfully (both source and target).
Any replication update fails, including as a result of explicit cancellation by an administrator (both source and target).
A scheduled replication update is skipped because another update for the same action is already in progress (see above).
Replication can be configured from any SS7000 appliance to any other SS7000 appliance regardless of whether each is part of a cluster and whether the appliance's cluster peer has replication configured in either direction, except for the following constraints:
Configuring replication from an appliance to itself or its cluster peer is unsupported. Shadow migration can be used to copy data between storage pools (e.g., to rebalance storage) on a single appliance or in a cluster.
Configuring replication from both peers of a cluster to the same replication target is unsupported, but a similar configuration can be achieved using two different IP addresses for the same target appliance. Administrators can use the multiple IP addresses of the target appliance to create one replication target on each cluster head for use by that head.
The following rules govern the behavior of replication in clustered configurations:
Replication updates for projects and shares are sent from whichever cluster peer has imported the containing storage pool.
Replication updates are received by whichever peer has imported the IP address configured in the replication action on the source. Administrators must ensure that the head using this IP address will always have the storage pool containing the replica imported. This is ensured by assigning the pool and IP address resources to the same head during cluster configuration.
Replication updates (both to and from an appliance) that are in progress when an appliance exports the corresponding storage pool or IP address (as part of a takeover or failback) will fail. Replication updates using storage pools and IP addresses unaffected by a takeover or failback operation will be unaffected by the operation.
For details on clustering and cluster terminology, review the Clustering documentation.
The appliance replicates snapshots and each snapshot is received atomically on the target, so the contents of a share's replica on the target always matches the share's contents on the source at the time the snapshot was taken. Because the snapshots for all shares sent in a particular group are taken at the same time (see above), the entire package contents after the completion of a successful replication update exactly matches the group's content when the snapshot was created on the source (when the replication update began).
However, each share's snapshots are replicated separately (and serially), so it's possible for some shares within a package to have been updated with a snapshot more recent than those of other shares in the same package. This is true during a replication update (after some shares have been updated but before others have) and after a failed replication update (after which some shares may have been updated but others may not have been).
Each share is always point-in-time consistent on the target (self-consistent).
When no replication update is in progress and the previous replication update succeeded, each package's shares are also point-in-time consistent with each other (package-consistent).
When a replication update is in progress or the previous update failed, package shares may be inconsistent with each other, but each one will still be self-consistent. If package consistency is important for an application, one must clone the replication package, which always clones the most recent successfully received snapshot of each share.
Snapshots are the basis for incremental replication. The source and target must always share a common snapshot in order to continue replicating incrementally, and the source must know which is the most recent snapshot that the target has. To facilitate this, the replication subsystem creates and manages its own snapshots. Administrators generally need not be concerned with them, but the details are described here since snapshots can have significant effects on storage utilization.
Each replication update for a particular action consists of the following steps:
Determine whether this is an incremental or full update based on whether we've tried to replicate this action before and whether the target already has the necessary snapshot for an incremental update.
Take a new project-level snapshot.
Send the update. For a full update, send the entire group's contents up to the new snapshot. For an incremental update, send the difference between from the previous (base) snapshot and the new snapshot.
Record the new snapshot as the base snapshot for the next update and destroy the previous base snapshot (for incremental updates).
This has several consequences for snapshot management:
During the first replication update and after the initial update when replication is not active, there is exactly one project-level snapshot for each action configured on the project or any share in the group. Note that snapshots may be created on shares not being sent as part of the update that are in the same project.
During subsequent replication updates of a particular action, there may be two project-level snapshots associated with the action. Both snapshots may remain after the update completes in the event of failure where the source was unable to determine whether the target successfully received the new snapshot (as in the case of a network outage during the update that causes a failure).
None of the snapshots associated with a replication action can be destroyed by the administrator without breaking incremental replication. The system will not allow administrators to destroy snapshots on either the source or target that are necessary for incremental replication. To destroy such snapshots on the source, one must destroy the action (which destroys the snapshots associated with the action). To destroy such snapshots on the target, one must first sever the package (which destroys the ability to receive incremental updates to that package).
Relatedly, administrators must not rollback to snapshots created prior to any replication snapshots. Doing so will destroy the later replication snapshots and break incremental replication for any actions using those snapshots.
As described above, replication updates include most of the configuration specified on the Shares screen for a project and its shares. This includes any target groups and initiator groups associated with replicated LUNs. When using non-default target groups and initiator groups, administrators must ensure that the target groups and initiator groups used by LUNs within the project also exist on the replication target. It is only required that groups exist with the same name, not that they define the same configuration. Failure to ensure this can result in failure to clone and export replicated LUNs.
The SCSI GUID associated with a LUN is replicated with the LUN. As a result, the LUN on the target appliance will have the same SCSI GUID as the LUN on the source appliance. Clones of replicated LUNs, however, will have different GUIDs (just as clones of local LUNs have different GUIDs than their origins).
Replication in 2009.Q3 and earlier was project-level only and explicitly disallowed replicating projects containing clones whose origin snapshots resided outside the project. With share-level replication in 2010.Q1 and later, this restriction has been relaxed, but administrators must still consider the origin snapshots of clones being replicated. In particular, the initial replication of a clone requires that the origin snapshot have already been replicated to the target or is being replicated as part of the same update. This restriction is not enforced by the appliance management software, but attempting to replicate a clone when the origin snapshot does not exist on the target will fail.
In practice, there are several ways to ensure that replication of a clone will succeed:
If the clone's origin snapshot is in the same project, just use project-level replication.
If the clone's origin snapshot is not in the same project or project-level replication that includes the origin is undesirable for other reasons, use share-level replication to replicate the origin share first and then use project-level or share-level replication to replicate the clone.
Do not destroy the clone's origin on the target system unless you intend to also destroy the clone itself.
In all cases, the "include snapshots" property should be true on the origin's action to ensure that the origin snapshot is actually sent to the target.
While replication-specific analytics are not currently available, administrators can use the advanced TCP analytics to observe traffic by local port. Replication typically uses port 216 on the source appliance.
The status of individual replication actions and packages can be monitored using the BUI and CLI. See "Configuring Replication" above.
Individual replication updates can fail for a number of reasons. Where possible, the appliance reports the reason for the failure in alerts posted on the source appliance or target appliance, or on the Replication screen for the action that failed. You may be able to get details on the failure by clicking the orange alert icon representing the action's status. The following are the most common types of failures:
A replication update fails if any part of the update fails. The current implementation replicates the shares inside a project serially and does not rollback changes from failed updates. As a result, when an update fails, some shares on the target may be up-to-date while others are not. See "Snapshots and Data Consistency" above for details.
Although some data may have been successfully replicated as part of a failed update, the current implementation resends all data that was sent as part of the previous (failed) update. That is, failed updates will not pick up where they left off, but rather will start where the failed update started.
When manual or scheduled updates fail, the system does not automatically try again until the next scheduled update (if any). When continuous replication fails, the system waits several minutes and tries again. The system will continue retrying failed continuous replications indefinitely.
When a replication update is in progress and another update is scheduled to occur, the latter update is skipped entirely rather than started immediately after the previous update completes. The next update will be sent only when the next update is scheduled to occur. The system posts an alert when an update is skipped for this reason.
The replication implementation has changed significantly between the 2009.Q3 and 2010.Q1 releases. It remains highly recommended to suspend replication to and from an appliance before initiating an upgrade from 2009.Q3 or earlier. This is mandatory in clusters using rolling upgrade.
There are three important user-visible changes related to upgrade to 2010.Q1 or later:
The network protocol used for replication has been enhanced. 2009.Q3 systems can replicate to systems running any release (including 2010.Q1 and later), while systems running 2010.Q1 or later can only replicate to other systems running 2010.Q1 or later. In practice, this means that replication targets must be upgraded before or at the same time as their replication sources to avoid failures resulting from incompatible protocol versions.
Replication action configuration is now stored in the storage pool itself rather than on the head system. As a result, after upgrading from 2009.Q3 or earlier to 2010.Q1, administrators must apply the deferred updates to migrate their replication configuration.
* Until these updates are applied, incoming replication updates for existing replicas will fail, and replication updates will not be sent for actions configured under 2009.Q3 or earlier. Additionally, space will be used in the storage pool for unmigrated replicas that are not manageable from the BUI or CLI.
* Once these updates are applied, as with all deferred updates, rolling back the system software will have undefined results. It should be expected that under the older release, replicated data will be inaccessible, all replication actions will be unconfigured, and incoming replication updates will be full updates.
Replication authorizations have been moved from their own scope into the Projects and Shares scope. Any replication authorizations configured on 2009.Q3 or earlier will no longer exist under 2010.Q1. Administrators using fine-grained access control for replication should delegate the new replication authorizations to the appropriate administrators after upgrading.