Use Migration Tools to Move Data into OCI Cloud Storage Services

Introduction

This is tutorial 1 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is setup so you can review this first tutorial to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs.

OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.

OCI cloud storage services offer fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.

Many customers find the need to transfer data into OCI cloud storage service from on-prem, another provider, or between OCI cloud storage services. Based on origination, destination, and the direction of data transfer, the best method to accomplish the migration can vary. Once you have identified the basics of the data source and the destination in OCI you will then have to decide on a migration path and which tools you will need to use. Let our hands-on experience guide you towards the right migration tool and how to use it. This tutorial 1 in a series to introduce you to various tools and where they might best fit into the migration process.

Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.

Audience

DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.

Objectives

Learn how to use various tools to copy and synchronize data into OCI cloud storage services:

Prerequisites

Migration Tools

Our customer experience and testing can help guide you to which migration tool will best fit your scenario. We encourage you to do a proof of concept and tests on sample data sets to verify the best migration method for your data set. We will explore common migration tools available to migrate data from on-premises and other cloud providers to OCI or within OCI cloud storage services. Test results are shared from sample datasets so you can extrapolate how the tools may perform on your data set.

The Right Tool for the Job

There are many tools to choose from when doing a migration, and it can be overwhelming to research them all at once. Based on the origination, destination, direction of the migration, user experience, and user environment the best migration tool will vary.

The following table provides recommendations for common migration scenarios, including the migration of on-premises data, migration of data from another cloud vendor into OCI, copying OCI cloud storage data from one region to another, copying OCI cloud storage data within a region, and copying OCI File Storage data to OCI Object Storage.

Migrate Data From Migrate Data To Recommended Tool(s) Notes Documentation/ Tutorial Links
On-prem filesystem OCI Object Storage 1. s5cmd (small/mixed files)
2. OCI Object Storage Sync (few large files)
3. Rclone (mixed)
4. Resilio Active Anywhere
Use the tool best for your data structures and you feel comfortable using. FastConnect should be reviewed to improve transfer time. S5cmd and Object Storage Sync Tutorial,
Rclone Tutorial,
Resilio Active Anywhere
Another Cloud Vendor Object or Blob Storage OCI Object Storage 1. Flexify IO
2. Rclone
Use Flexify IO for S3 compatible vendors and when a supported GUI interface is desired.
Use Rclone when you’re comfortable experimenting and toggling various settings and for the most compatible supporting over 70 different cloud vendors (S3 compatible and non-S3 compatible object storage).
Flexify IO migrate between clouds, Rclone Tutorial
OCI Object Storage OCI Object Storage in another region 1. Object Replication
2. OCI Object Storage Bulk Copy Python API
3. Flexify IO
4. Rclone
Using the native Object Replication is good for an exact replica of a new bucket that is currently empty, use OCI Object Storage Bulk Copy Python API, Flexify IO or Rclone to initialize copies of a source bucket which already has objects or when you want to preserve objects on the destination. Object Storage Replication documentation,
Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations,
Flexify IO,
Rclone Tutorial
OCI Object Storage OCI Object Storage in another tenancy (same or different region) 1. OCI Object Storage Bulk Copy Python API
2. Flexify IO
3. Rclone
Pre-requisite for OCI Object Storage Bulk Copy Python API: Use cross tenancy IAM policies to enable cross-tenancy copies. For more information, see Accessing Object Storage Resources Across Tenancies. Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations,
Flexify IO,
Rclone Tutorial
OCI File Storage OCI Object Storage 1. s5cmd (small/mixed files)
2. Object Storage Sync (few large files)
3. Rclone (mixed)
4. Resilio Active Anywhere
Use the tool best for your data structures and you feel comfortable using. S5cmd and Object Storage Sync Tutorial,
Rclone Tutorial,
Resilio Active Anywhere
On-prem filesystem OCI File Storage 1. fpsync (Linux) and CIFS + fpsync (Windows)
2. Resilio Active Anywhere
Ensure that network connectivity is established between source and destination instances. Fpsync documentation,
fpsync tutorial,
Resilio Active Anywhere
Another cloud vendor local disk or file storage OCI File Storage 1. fpsync (Linux) and CIFS + fpsync (Windows)
2. Resilio Active Anywhere
Ensure that network connectivity is established between source and destination instances. FastConnect should be reviewed to improve transfer time. Fpsync documentation,
fpsync tutorial,
Resilio Active Anywhere
OCI File Storage OCI File Storage in another region 1. File System Replication
2. fpsync with instance-to-instance streaming
3. Resilio Active Anywhere
If you use replication, see replication’s Limitations and Considerations or if you use instance-to-instance streaming, ensure that network connectivity is established between source and destination instances. Fpsync documentation,
fpsync tutorial,
File System Replication documentation,
Resilio Active Anywhere
OCI File Storage OCI File Storage within the same availability domain 1. File System Replication
2. Using File Storage Parallel Tools: parcp
3. Resilio Active Anywhere
If you use replication, see replication’s Limitations and Considerations or if you use parcp, ensure that both source and destination file systems are mounted in the instance. File System Replication,
Using File Storage Parallel Tools: parcp,
Resilio Active Anywhere
On-premises, Another cloud vendor OCI Object Storage or OCI File Storage Resilio Active Anywhere Platform Use Resilio Active Anywhere platform when you need multi-way synchronization of data, white glove service with support, and GUI interface. Resilio has been verified by the OCI cloud storage service product team and is available in the Oracle Cloud Marketplace. For more information on using their platform reach out to the Resilio team.

Note: The migration tool series will not cover OCI Object Storage or OCI File System Replication, OCI Object Storage Bulk Copy Python API, Flexify, and Resilio. See Related Links for more information.

Next Steps

Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:

(Optional) Test Environments

Recommendations are made based on testing and customer interactions.

Test Environment 1:

1 VM instance VM.Standard.E4.Flex, 1 OCPU, 1Gbps network bandwidth, 16GB of memory. To simulate on-premises to OCI migration copied data from PHX NFS to IAD.

Data Sets

Total data set size: 3TB, with 3 files, each file 1TB.

Method To-From Time Command Flags
os sync NFS/File PHX to Object IAD 123m17.102s NA --parallel-operations-count 100
s5cmd NFS/File PHX to Object IAD 239m20.625s copy run commands.txt, default run --numworkers 256
rclone NFS/File PHX to Object IAD 178m27.101s copy --transfers=100 --oos-no-check-bucket --fast-list --checkers 64 --retries 2 --no-check-dest

Note: Our tests showed os sync running the fastest for this data set.

Total data set size: 9.787GB, with 20,000 files, each file 20MB

Method To-From Time Command Flags
s5cmd NFS/File PHX to Object IAD 1m12.746s copy default run --numworkers 256
os sync NFS/File PHX to Object IAD 2m48.742s NA --parallel-operations-count 1000
rclone NFS/File PHX to Object IAD 1m52.886s copy --transfers=500 --oos-no-check-bucket --no-check-dest

Note: Our tests showed s5cmd performing the best for this data set.

Test Environment 2:

VM Instances: 2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing. Flexify IO does not use VM instances.

Data sets used in testing: 14 main directories with the following file count and sizes, totaling 2.25TiB.

Data Set Directory Size File count Size of Each File
Directory 1 107.658 GiB 110,242 1 MiB
Directory 2 1.687 GiB 110,569 15 MiB
Directory 3 222 GiB 111 2 GiB
Directory 4 1.265 TiB 1,295 1 GiB
Directory 5 26.359 GiB 1,687 16 MiB
Directory 6 105.281 MiB 26,952 4 KiB
Directory 7 29.697 MiB 30,410 1 KiB
Directory 8 83.124 GiB 340,488 256 KiB
Directory 9 21.662 GiB 354,909 64 KiB
Directory 10 142.629 GiB 36,514 4 MiB
Directory 11 452.328 MiB 57,898 8 MiB
Directory 12 144 GiB 72 2GiB
Directory 13 208.500 GiB 834 256 MiB
Directory 14 54.688 GiB 875 64 MiB

Note:

Method To-From Time Command Flags/ Notes
s5cmd NFS/File PHX to Object IAD 54m41.814s copy --numworkers 74
os sync NFS/File PHX to Object IAD 65m43.200s NA --parallel-operations-count 50
rclone NFS/File PHX to Object IAD 111m59.704s copy --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 50
rclone Object PHX to Object IAD 28m55.663s copy --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 400, same command run across 2 VM’s for a concurrency of 800 transfers
python bulk copy script Object PHX to Object IAD 25m43.715s Default 1 VM, 50 workers, 100,000 files queued at a time
Flexify IO Object PHX to Object IAD 20m27s copy Defaults to 10 engines/ slots
Flexify IO Object PHX to Object IAD 16m12s copy 20 engines/ slots, this can be raised via “Advanced Settings”

The s5cmd and os sync commands do well over filesystem/NFS to object storage. Flexify IO and bulk copy script only focusses on object storage ( bucket-to-bucket) transfers and was not tested for NFS migration.

Only Flexify IO,rclone, and the python bulk copy script are capable of doing bucket-to-bucket transfers across regions so the other tools were not tested for it. Flexify IO performs the best for object storage migrations across regions, with The python bulk copy script perfoming better than rclone. It is important to note that Flexify IO works for S3 compatible object storage, the python bulk copy script only works with OCI Object storage, and rclone supports many backends and cloud providers.

Small test runs were conducted using rclone to transfer data from Microsoft Azure Blob Storage, Amazon Simple Storage Service (Amazon S3), and Google Cloud Platform Cloud Storage to OCI Object Storage to verify the tool works for these types of transfers. For more information, see Move data to object storage in the cloud by using Rclone.

FlexifyIO was used to migrate this data set from AWS us-east-2 to the OCI Ashburn region and took only 23m51s for the 2.25TiB using the default of 10 engines/ slots, additional engines/ slots could be added for faster performance.

Test Environment 3:

VM Instances: 1-2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing. All tests were bucket-to-bucket. Flexify IO does not use VM instances.

Total Size File Count File Size Range
7.74 TiB 1,000,000 30 MiB
Method To-From Time Command Flags Notes  
rclone Object-to-Object IAD -> IAD 18h39m11.4s copy --oos-no-check-bucket --fast-list --no-traverse --transfers 500 --oos-chunk-size 10Mi 1 VM, very slow due to the high file count and listing calls to source  
rclone Object-to-Object IAD -> IAD 55m8.431s copy --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> 2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time, prevents listing on source and destination and improves performance  
python bulk copy script Object-to-Object IAD -> IAD 28m21.013s NA Default 1 VM, 50 workers, 100,000 files queued at a time  
python bulk copy script Object-to-Object IAD -> IAD NA NA Default 2 VMs, 50 workers per VM, 100,000 files queued at a time. Received 429 errors, script hung and could not complete  
Flexify IO Object-to-Object IAD -> IAD 39m19s copy Default Defaults to 10 engines/slots  
Flexify IO Object-to-Object IAD -> IAD 21m37s copy 20 engines/ slots Set to 20 engines/ slots, this can be raised via “Advanced Settings”  
s5cmd Object-to-Object IAD -> IAD 14m10.864s copy Defaults (256 workers) 1 VM NA
s5cmd Object-to-Object IAD -> IAD 7m50.013s copy Defaults 2 VM’s, 256 workers each VM Ran in abuot half the time as 1 VM
s5cmd Object-to-Object IAD -> IAD 3m23.382s copy --numworkers 1000 1 VM, 1000 workers Across multiple tests we found this was the optimal run for this data set with the s5cmd
rclone Object-to-Object IAD -> PHX 184m36.536s copy --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> 2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time  
python bulk copy script Object-to-Object IAD -> PHX 35m31.633s NA Default 1VM, 50 workers, 100,000 files queued at a time  
Flexify IO Object-to-Object IAD -> PHX 21m17s copy 20 engines/ slots Set to 20 engines/ slots, this can be raised via “Advanced Settings”  

The s5cmd command ran consistently best for the large file count and small files. The s5cmd is limited because it can only do bucket-to-bucket copies within the same tenancy and same region.

Flexify IO would be the recommended tool for this migration data set since it performs well and supports various S3 compatible object storage types. Migration time went down after raising engine/ slot count for Flexify IO.

Notice high improvements to rclone once files are fed to the command and from scaling out to another VM. Rclone may run slower than other tools,however it is the most versatile in the various platforms it supports and types of migrations it can perform.

The OCI Object Storage Bulk Copy Python API can only use the OCI Native CopyObject API and can only get up to a concurrency of 50 workers before being throttled, it generally performs pretty well for this data set.

Tests for IAD to PHX were only done on what worked best in IAD to IAD and problematic tests were not re-run. The s5cmd was not run for IAD to PHX because it can only do bucket-to-buckets copies within the same region.

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.