Note:

Move Data into OCI Cloud Storage Services using Rclone

Introduction

This is tutorial 2 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is set up so you can review Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs. This tutorial will focus on using Rclone to migrate data into OCI cloud storage services.

OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.

OCI cloud storage services offers fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.

Rclone is an open source, command-line utility to migrate data to the cloud, or between cloud storage vendors. Rclone can be used to do one-time migration as well as periodical synchronization between source and destination storage. Rclone can migrate data to and from object storage, file storage, mounted drives, and between 70 supported storage types. OCI Object Storage is natively supported as a Rclone backend provider. Rclone processes can be scaled up and scaled out to increase the transfer performance using parameter options.

Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.

Audience

DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.

Objective

Learn how to use Rclone to copy and synchronize data into OCI cloud storage services.

Prerequisites

Overview of Rclone and Basic Terms

Rclone is a helpful migration tool because of the many protocols and cloud providers it supports and ease of configuration. It is a good general purpose migration tool for any type of data set. Rclone works particularly well for data sets that can be split up into batches to scale-out across nodes for faster data transfer.

Rclone can be used to migrate:

Rclone Commands and Flags:

Rclone Usage Examples

Migrate a Large Number Files using Rclone

Rclone syncs on a directory-by-directory basis. If you are migrating tens of millions of files/objects, it is important to make sure the directories/prefixes are divided up into around 10,000 files/objects or lower per directory. This is to prevent Rclone from using too much memory and then crashing. Many customers with a high count (100’s of millions or more) of small files often run into this issue. If all your files are in a single directory, divide them up first.

  1. Run the following command to get a list of files in the source.

    rclone lsf --files-only -R src:bucket | sort > src
    
  2. Break up the file into chunks of 1,000-10,000 lines, using split. The following split command will divide up the files into chunks of 1,000 and then put them in files named src_## such as src_00.

    split -l 1000 --numeric-suffixes src src_
    
  3. Distribute the files to multiple VM instances to scale out the data transfer. Each Rclone command should look like:

    rclone --progress --oos-no-check-bucket --no-traverse --transfers 500 copy remote1:source-bucket remote2:dest-bucket --files-from src_00
    

    Alternatively, a simple for loop can be used to iterate through the file lists generated from the split command. During testing with ~270,000 files in a single bucket, we saw copy times improve 40x, your mileage may vary.

    Note: Splitting up the files by directory structure or using the split utility is an important way to optimize transfers.

Use Rclone, OKE and fpart together for Moving Data from File Systems to OCI Object Storage

Multiple Kubernetes pods can be used to scale out data transfer between file systems and object storage. Parallelization speeds up data transfers to storage systems that are relatively high latency and are high throughput. The approach combining Rclone, OKE and fpart partitions directory structures into multiple chunks and runs the data transfer in parallel on containers either on the same compute node or across multiple nodes. Running across multiple nodes aggregates the network throughput and compute power of each node.

Follow the steps:

  1. Identify a host that will be your fpsync operator host that has access to the migration source data and Rclone is installed.

  2. Run the following command to install kubectl.

    # curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
    # chmod 755 kubectl
    # cp -a kubectl /usr/bin
    
  3. Create an OCI IAM policy for the fpsync operator host to manage the OKE cluster.

    The following policy can be used for this purpose. A more granular permission can be configured to achieve the bare minimum requirement to control the pods.

    Allow dynamic-group fpsync-host to manage cluster-family in compartment storage
    
  4. Setup the kubeconfig file to have access to the OKE cluster. For more information, see Setting Up Local Access to Clusters.

  5. Install and patch fpart and fpsync. The fpsync patch is required to run Rclone or rsync in parallel to scale out the data transfer. The fpsync that comes with the fpart package does not support Rclone or Kubernetes pods, a patch is needed to support these tools.

    Run the following command to install on Ubuntu.

    # apt-get install fpart
    # git clone https://github.com/aboovv1976/fpsync-k8s-rclone.git
    # cd fpsync-k8s-rclone/
    # cp -p /usr/bin/fpsync /usr/bin/k-fpsync
    # patch /usr/bin/k-fpsync fpsync.patch
    
  6. Build the container image.

    The docker image build specification available in rclone-rsync-image can be used to build the container image. Once the image is built, it should be uploaded to a registry that can be accessed from the OKE cluster.

    # rclone-rsync-image
    # docker build -t rclone-rsync . 
    # docker login
    # docker tag rclone-rsync:latest <registry url/rclone-rsync:latest>
    # docker push <registry url/rclone-rsync:latest>
    

    A copy of the image is maintained in fra.ocir.io/fsssolutions/rclone-rsync:latest. The sample directory contains some examples output files.

  7. Run k-fpsync. The patched fpsync (k-fpsync) can partition the source file system and scale out the transfer using multiple Kubernetes pods. The Kubernetes pod anti-affinity rule is configured to prefer nodes that do not have any running transfer worker pods. This helps to utilize the bandwidth on the nodes effectively to optimize performance. For more information, see Assigning Pods to Nodes.

    Mount the source file system on the fpart operator host and create a shared directory that will be accessed by all the pods. This is the directory where all the log files and partition files are kept.

    The following command transfers data from the filesystem /data/src to the OCI Object Storage bucket rclone-2. It will start 2 pods at a time to transfer the file system partition created by fpart.

    # mkdir /data/fpsync
    # PART_SIZE=512 && ./k-fpsync -v -k fra.ocir.io/fsssolutions/rclone-rsync:latest,lustre-pvc  -m rclone -d /data/fpsync  -f $PART_SIZE -n 2 -o "--oos-no-check-bucket --oos-upload-cutoff 10Mi --multi-thread-cutoff 10Mi --no-check-dest --multi-thread-streams 64 --transfers $PART_SIZE  --oos-upload-concurrency 8 --oos-disable-checksum  --oos-leave-parts-on-error" /data/src/ rclone:rclone-2
    

    Note: The logs for the run are kept in the run-ID directory, in the following example they are in /data/fpsync/{Run-Id}/log directory. The sample outputs are provided in the sample directory.

(Optional) Test Environments

Recommendations are made based on testing and customer interactions.

Note: Runs from the bulk copy script, os sync and s5cmd results are included to give more information on performance. Learn about using the bulk copy script from here: Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations. For more information about using os sync and the s5cmd, see Tutorial 3: Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and S5cmd.

Test Environment 1:

1 VM instance VM.Standard.E4.Flex, 1 OCPU, 1Gbps network bandwidth, 16GB of memory. To simulate on-premises to OCI migration copied data from PHX NFS to IAD.

Data Sets

Test Environment 2:

VM Instances: 2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing.

Data sets used in testing: 14 main directories with the following file count and sizes.

Data Set Directory Size File count Size of Each File
Directory 1 107.658 GiB 110,242 1 MiB
Directory 2 1.687 GiB 110,569 15 MiB
Directory 3 222 GiB 111 2 GiB
Directory 4 1.265 TiB 1,295 1 GiB
Directory 5 26.359 GiB 1,687 16 MiB
Directory 6 105.281 MiB 26,952 4 KiB
Directory 7 29.697 MiB 30,410 1 KiB
Directory 8 83.124 GiB 340,488 256 KiB
Directory 9 21.662 GiB 354,909 64 KiB
Directory 10 142.629 GiB 36,514 4 MiB
Directory 11 452.328 MiB 57,898 8 MiB
Directory 12 144 GiB 72 2GiB
Directory 13 208.500 GiB 834 256 MiB
Directory 14 54.688 GiB 875 64 MiB

Note:

Method To-From Time Command Flags/ Notes
s5cmd NFS/File PHX to Object IAD 54m41.814s copy --numworkers 74
os sync NFS/File PHX to Object IAD 65m43.200s NA --parallel-operations-count 50
rclone NFS/File PHX to Object IAD 111m59.704s copy --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 50
rclone Object PHX to Object IAD 28m55.663s copy --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 400, same command run across 2 VM’s for a concurrency of 800 transfers
python bulk copy script Object PHX to Object IAD 25m43.715s Default 1 VM, 50 workers, 100,000 files queued at a time

The s5cmd and os sync commands do well over filesystem/NFS to object storage. The bulk copy script only does bucket-to-bucket transfers and was not tested for NFS migration.

Only rclone and the python bulk copy script are capable of doing bucket-to-bucket transfers across regions so the other tools were not tested for it. The python bulk copy script perfoms better on the cross region bucket-to-bucket data, however is only compatible with OCI Object Storage while rclone supports many backends and cloud providers.

Small test runs were conducted using rclone to transfer data from Microsoft Azure Blob Storage, Amazon Simple Storage Service (Amazon S3), and Google Cloud Platform Cloud Storage to OCI Object Storage to verify the tools works for these types of transfers. For more information, see Move data to object storage in the cloud by using Rclone.

Test Environment 3:

VM Instances: 1-2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing. All tests were bucket-to-bucket.

Total Size File Count File Size Range
7.74 TiB 1,000,000 30 MiB
Method To-From Time Command Flags Notes  
rclone Object-to-Object IAD -> IAD 18h39m11.4s copy --oos-no-check-bucket --fast-list --no-traverse --transfers 500 --oos-chunk-size 10Mi 1 VM, very slow due to the high file count and listing calls to source  
rclone Object-to-Object IAD -> IAD 55m8.431s copy --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> 2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time, prevents listing on source and destination and improves performance  
python bulk copy script Object-to-Object IAD -> IAD 28m21.013s NA Default 1 VM, 50 workers, 100,000 files queued at a time  
python bulk copy script Object-to-Object IAD -> IAD NA NA Default 2 VMs, 50 workers per VM, 100,000 files queued at a time. Received 429 errors, script hung and could not complete  
s5cmd Object-to-Object IAD -> IAD 14m10.864s copy Defaults (256 workers) 1 VM NA
s5cmd Object-to-Object IAD -> IAD 7m50.013s copy Defaults 2 VM’s, 256 workers each VM Ran in abuot half the time as 1 VM
s5cmd Object-to-Object IAD -> IAD 3m23.382s copy --numworkers 1000 1 VM, 1000 workers Across multiple tests we found this was the optimal run for this data set with the s5cmd
rclone Object-to-Object IAD -> PHX 184m36.536s copy --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> 2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time  
python bulk copy script Object-to-Object IAD -> PHX 35m31.633s NA Default 1VM, 50 workers, 100,000 files queued at a time  

The s5cmd command ran consistently best for the large file count and small files. The s5cmd is limited because it can only do bucket-to-bucket copies within the same tenancy and same region.

Notice high improvements to rclone once files are fed to the command and from scaling out to another VM. Rclone may run slower than other tools, it is the most versatile in the various platforms it supports and types of migrations it can perform.

The OCI Object Storage Bulk Copy Python API can only use the OCI Native CopyObject API and can only get up to a concurrency of 50 workers before being throttled.

Tests for IAD to PHX were only done on what worked best in IAD to IAD and problematic tests were not re-run. The s5cmd was not run for IAD to PHX because it can only do bucket-to-buckets copies within the same region.

Next Steps

Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.