Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Move Data into OCI Cloud Storage Services using Fpsync and Rsync
Introduction
This is tutorial 4 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is set up so you can review Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs. This tutorial will focus on using fpsync and rsync to migrate file system data into OCI File Storage.
OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.
OCI cloud storage services offer fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.
-
The fpsync tool is a parallel wrapper script that by default uses rsync. It can also use Rclone (as covered in Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone), tar, tarify, and cpio.
-
Rsync is a versatile utility for transferring and synchronizing files for both remote and local file systems.
Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.
Difference Between Rsync and Fpsync
-
Rsync is a traditional Linux Operating System (OS) utility used to do a one-time copy or periodical sync of data from one machine to another in the same or different geographic locations. A single rsync process may not be enough with large datasets to perform the transfer in the desired amount of time. Rsync has options to spin up multiple rsync processes, each running on a specific subset of the dataset, completing the dataset transfer more quickly than a single process. However, determining the equilibrium between how many processes and what subsets can be challenging based on the complexity of dataset hierarchy. Fpsync simplifies this process. Fpsync is an orchestrator that splits the entire dataset into smaller chunks and spawns multiple rsync processes based on user set parallelism. Internally, Fpsync uses rsync to do the actual transfer. Cpio and tar options are available as underlying tool options for fpsync, but rsync is the default.
-
Fpsync has worker nodes option where you can distribute the transfer between multiple nodes instead of a single node. In addition to increase of parallel rsync processes in the same node (scaling up), you can also increase the number of nodes (scaling out) to run the increased number of rsync processes.
Audience
DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.
Objective
Learn how to use rsync and fpsync to copy and synchronize data into OCI cloud storage services:
-
Learn how to use rsync and fpsync together.
-
Understand the performance benefits of using fpsync.
Prerequisites
-
An OCI account.
-
Oracle Cloud Infrastructure Command Line Interface (OCI CLI) installed with a working config file in your home directory in a subdirectory called
.oci
. For more information, see Setting up the Configuration File. -
User permission to create, export, and mount OCI File Storage or access to an OCI File Storage mount target that is already mounted on a VM. For more information, see Manage File Storage Policy.
-
You know how to create, manage, and mount network attached storage (NAS) and OCI File Storage. For more information, see OCI Configuring File System Storage and Overview of File Storage.
-
Access to create and launch OCI Compute instances or access to 3 systems to run fpsync. For more information, see Creating an Instance.
-
Familiar with:
-
Working with SSH, generating SSH keys, and working with SSH configuration files. For more information, see Creating an SSH Key Pair on the Linux Command Line for OCI Access.
-
Basic networking tools and commands to check connectivity between two sites or systems.
-
Using a terminal or shell interface on Mac OS, Linux, Berkeley Software Distribution (BSD) and on Windows PowerShell, command prompt, or bash.
-
Installing software on a Linux system.
-
-
To know the migration tools we can use, see Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services.
Migrate Data into OCI File Storage
Fpsync and rsync can be used to migrate file system data (OCI File Storage service, OCI Block Volumes service, OCI File Storage with Lustre, on-premises file system, and on-premises network file system (NFS) to other file system storage types (including OCI File Storage).
Use Rsync to Migrate Data
-
Use rsync with instance-to-instance streaming.
For small datasets up to a few tens of GB and few thousands of files, rsync instance-to-instance streaming can be used. The instance-to-instance streaming using SSH enabled local NFS within a network and SSH between the source and destination network and thereby helps reduce latency of NFS between two networks. Use the following command.
rsync --archive --perms --owner --group --xattrs --acls --recursive --delete --compress --ignore-errors --progress --log-file=$HOME/rsync/logs/test.log1 --quiet -e ssh /src/path/ root@<destination_instance>:/destination/path/
-
Run multiple rsync processes in parallel.
-
You can use the
find
andxargs
commands to run multiple rsync processes.find /src/path/ -maxdepth 1 | xargs -P 24 -I {} rsync --archive --perms --owner --group --xattrs --acls --recursive --delete --compress --log-file=<logfile_path> -quiet -e ssh {} root@<destination_instance>:/destination/path/
-
You can also use GNU parallel.
find /src/path/ -maxdepth 1 | parallel -P24 rsync --archive --perms --owner --group --xattrs --acls --recursive --delete --compress --exclude=.snapshot --ignore-errors --progress --log-file=$HOME/rsync/logs/test.log1 --quiet -e ssh {} root@<destination_instance>:/destination/path/
Note: In both examples, 24 processes are run at a time, the parameter was chosen based on the CPU capacity of the instance used.
-
Use Fpsync to Migrate Data
The fpsync tool is a parallel wrapper of rsync. It can also use tar, tarify and cpio but default is rsync.
-
Install fpsync in your Linux machine.
- Run the following command for Linux 8.
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm sudo yum install fpart -y
- Run the following command for Linux 9.
sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm sudo yum install fpart -y
-
Run fpsync using the following command.
For example:
fpsync -v -n `nproc` -o "-lptgoD -v --numeric-ids --logfile /tmp/fpsync.log1” /src/path/ root@<destination_instance>:/destination/path/
Note: For more fpsync options and parameter details, refer to
man fpsync
.
Run Fpsync using Three Worker Nodes to Migrate Data from On-Premises File Shares to OCI File Storage Service
Follow the high level steps to run fpsync using three worker nodes to migrate data from on-premises file shares (local disk, SAN or NAS) to OCI File Storage service.
The following image shows a component architecture diagram.
Follow the steps:
-
Identify three worker nodes and a destination node.
Identify three local systems you have access to for mounting your source file system. Alternatively, you can create and launch three OCI Compute VM instances for testing purposes.
Identify an existing OCI VM instance or create and launch a new one to serve as the destination node.
-
Mount the source NAS share to the three nodes.
Use the
nordirplus
andnconnect=16
mount options, do not specify othernfs
mount options.For example, run the following mount command on a Linux system.
sudo mount -t nfs -o nordirplus,nconnect=16 10.x.x.x:/<EXPORT_PATH_NAME> /mnt/nfs-data
For example, run the following command to verify the mount.
mount | grep /mnt/nfs-data
Note: For testing purposes, you can use OCI File Storage to create, export, and mount a file system. Test data can be created in the mount to try fpsync.
-
Select a node to run fpsync and update the
/etc/hosts
file.You can either choose one of the three nodes or a different node to run the
fpsync
command. The node where thefpsync
command is run is called an executer node.On the node where fpsync will be run, use your preferred text editor to update the
/etc/hosts
file with the three worker nodes asworker-src-1
,worker-src-2
, andworker-src-3
.For example:
vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.0.4 worker-src-1.subnet.vcn.oraclevcn.com worker-src-1 10.0.0.5 worker-src-2 10.0.0.6 worker-src-3
-
Mount the destination file system on the destination node.
Identify the destination file system and mount it on the destination node. Alternatively, create and mount an OCI File Storage onto the destination node.
-
Ensure on-premises to OCI connectivity has been established.
Use common networking tools such as
ping
,traceroute
,ssh
and so on, to verify connectivity between systems, on-premises, and OCI. -
Enable passwordless SSH mutually among all the source and destination nodes and verify SSH connection between the source and destination node pairs.
Identify the public SSH key for the user who will execute the
fpsync
command on the executor node. This key is typically located in the user’s home directory under the.ssh
directory and is usually namedid_rsa.pub
. Propagate this key to all the worker nodes by using thecat
command to display its contents, copying the key, and pasting it into the$HOME/.ssh/authorized_keys
file on the worker nodes. Alternatively, if password-based SSH is enabled, you can use thessh-copy-id
command to distribute the key to each worker node and the destination node. For example:[worker-src-1 ~]$ ssh-copy-id username@worker-src-2
-
Run the fpsync command in the executer node.
Note: fpsync is required to be installed only on the executer node which will run the
rsync
commands to the destination nodes through SSH through the worker nodes.For example:
fpsync -v -n X -f Y -o "-lptgoD -v --numeric-ids -e ssh -C" \ -w username@worker-src-1 -w username@worker-src-2 -w username@worker-src-3 \ -d /nfs-data/fpsync_wrkr /nfs-data/fpsync_src/ opc@worker-dest:/fpsync-fs/
Note: Replace X and Y with values for the fpsync options
-n
and-f
.-
Determine the value for concurrent sync jobs which is
-n
.- Select the
-n
value to be equal to number ofcpu_cores
of all worker nodes in the source and keep as many as destination worker nodes of same CPU and memory. - If you have 3 worker nodes with 16 CPU cores in each, it is 3 worker nodes times 16
cpu_cores
= 48.
- Select the
-
Determine the value for the amount of files to transfer per sync job which is
-f
.- For example, if you have two folders with large directories and a total 1.1 million files, the two folders contain ~700K files with an average file size of 160KB.
- Each worker node is configured with: 64GB = 64000000KB memory, 8 OCPU= 16
cpu_cores
, with memory percpu_core
being: 64000000/16 = 4000000KB/cpu_core
. Memory in each worker node = 64GB = 64000000KB. - Calculate the value of
-f
, 4000000KB/160 = 25000.
-
(Optional) Test Environment
To simulate the on-premises to OCI migration, OCI File Storage file system (Ashburn) with the following dataset is used as on-premises NAS share and OCI File Storage file system (Phoenix) is used as destination.
Both regions are remotely peered using Dynamic Routing Gateway.
Data Set Directory | Size | File count | Size of Each File |
---|---|---|---|
Directory 1 | 107.658 GiB | 110,242 | 1 MiB |
Directory 2 | 1.687 GiB | 110,569 | 15 MiB |
Directory 3 | 222 GiB | 111 | 2 GiB |
Directory 4 | 1.265 TiB | 1,295 | 1 GiB |
Directory 5 | 26.359 GiB | 1,687 | 16 MiB |
Directory 6 | 105.281 MiB | 26,952 | 4 KiB |
Directory 7 | 29.697 MiB | 30,410 | 1 KiB |
Directory 8 | 83.124 GiB | 340,488 | 256 KiB |
Directory 9 | 21.662 GiB | 354,909 | 64 KiB |
Directory 10 | 142.629 GiB | 36,514 | 4 MiB |
Directory 11 | 452.328 MiB | 57,898 | 8 MiB |
Directory 12 | 144 GiB | 72 | 2GiB |
Directory 13 | 208.500 GiB | 834 | 256 MiB |
Directory 14 | 54.688 GiB | 875 | 64 MiB |
VM Instances: In both Ashburn and Phoenix regions, three 16 cpu_core
, 64GB memory, 8Gbps bandwidth Linux9 VMs as worker nodes and a 8 cpu_core
VM as the executer node are used.
The following are the TCP settings all the instance have:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_window_scaling = 1
Both regional instances have the respective OCI File Storage file system mounted as mentioned in Run Fpsync using Three Worker Nodes to Migrate Data from On-Premises File Shares to OCI File Storage Service section.
Run the following fpsync
command. X and Y are fpsync options.
fpsync -v -n X -f Y -o "-lptgoD -v --numeric-ids -e ssh -C" \
-w opc@worker-src-1 -w opc@worker-src-2 -w opc@worker-src-3 \
-d /fpsync-fs/fpsync_wrkr /fpsync-fs/x_region_fpsync_src/ opc@worker-dest:/fpsync-fs/
-
Determine the value for
-n
.Select the -n value to be equal to number of cpu_cores of all worker nodes in the source and keep as many as destination worker nodes of same CPU and memory. In this example, it is 3 worker nodes times 16 cpu_cores = 48.
-
Determine the value for
-f
.In this example, two folders are large directories. Of the total 1.1 million files, the two folders contain ~700K files with an average file size of 160KB. Memory in each worker node = 64GB = 64000000KB. Processes in each worker node = 8 OCPU = 16 cpu_cores. Memory per Process = 64000000/16 = 4000000KB/process. Now, an appropriate value for -f can be calculated as 4000000KB / 160 = 25000.
The following table shows the time taken by fpsync to complete the 2.25 TB and 1 million files of data transfer for different X and Y and rsync SSH and compression combinations.
fpsync option | nfs mount option on source and destination worker nodes | File Storage Mount target performance type | Time taken |
---|---|---|---|
-n 30 -f 2000 -e ssh | nconnect=16,nordirplus | 3 standard mount targets, 1:1 mapped to worker nodes | 237m28s |
-n 48 -f 5000 -e ssh -C | nconnect=16,nordirplus | source and dest with 1 HPMT 40 each | 163m38.887s |
-n 60 -f 20000 | nconnect=16,nordirplus | 3 standard mount targets, 1:1 mapped to worker nodes | 124m25.435s |
-n 48 -f 400000 -e ssh -C | nconnect=16,nordirplus | 3 standard mount targets, 1:1 mapped to worker nodes | 122m55.458s |
-n 100 -f 200000 -e ssh | nconnect=16,nordirplus | 3 standard mount targets, 1:1 mapped to worker nodes | 120m44s |
-n 60 -f 200000 -e ssh | nordirplus only, NO nconnect | 3 standard mount targets, 1:1 mapped to worker nodes | 118m41.393s |
-n 60 -f 200000 -e ssh | nconnect=16,nordirplus | 3 standard mount targets, 1:1 mapped to worker nodes | 118m3.845s |
-n 48 -f 20000 -e ssh | nconnect=16,nordirplus | source and dest with 1 HPMT 40 each | 113m34.011s |
-n 48 -f 200000 | nconnect=16,nordirplus | source and dest with 1 HPMT 40 each | 110m15.555s |
-n 48 -f 200000 | nconnect=16,nordirplus | source and dest with 1 HPMT 40 each | 109m3.472s |
We can see that any combination of -n
above 48 and -f
above 20000 gave similar performance around 2 hours transfer time across region. Even, with high performance mount target 40 GBps, there is not much significant reduction in the time taken.
The result means that depending on the size of the actual dataset to be transferred, you can select either multiple standard or high performance mount target for the file system. If the source dataset is made of large size files (file size >= 1M) mostly and total dataset size is 20TB and above, high performance mount target is a good option. Else, the standard mount targets with a scale-out configuration can give a desired performance as well as cost effective.
Next Steps
Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:
-
Using Rclone, see Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone.
-
Using OCI Object Storage Sync and S5cmd, see Tutorial 3: Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and S5cmd.
Related Links
-
Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services
-
Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone
-
Tutorial 3: Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and S5cmd
Acknowledgments
- Author - Vinoth Krishnamurthy (Principal Member of Technical Staff, OCI File Storage)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Move Data into OCI Cloud Storage Services using Fpsync and Rsync
G25616-01
January 2025
Copyright ©2025, Oracle and/or its affiliates.