Use Migration Tools to Move Data into OCI Cloud Storage Services
Introduction
This is tutorial 1 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is setup so you can review this first tutorial to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs.
OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.
OCI cloud storage services offer fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.
Many customers find the need to transfer data into OCI cloud storage service from on-prem, another provider, or between OCI cloud storage services. Based on origination, destination, and the direction of data transfer, the best method to accomplish the migration can vary. Once you have identified the basics of the data source and the destination in OCI you will then have to decide on a migration path and which tools you will need to use. Let our hands-on experience guide you towards the right migration tool and how to use it. This tutorial 1 in a series to introduce you to various tools and where they might best fit into the migration process.
Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.
Audience
DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.
Objectives
Learn how to use various tools to copy and synchronize data into OCI cloud storage services:
-
Identify common migration tools.
-
Learn about the various tools for migrating filesystem data (local, NAS, cloud hosted) into OCI cloud storage services.
-
Learn which tool is best suited for various use cases.
Prerequisites
-
An understanding of your data migration set, overall size and what the files or objects look like (few large files, many small files, and so on).
-
Where in OCI cloud storage service (Object Storage, File Storage, or Block Volumes) the storage should land.
-
Your timelines and if you will be doing a bulk or incremental (batch) migration.
Migration Tools
Our customer experience and testing can help guide you to which migration tool will best fit your scenario. We encourage you to do a proof of concept and tests on sample data sets to verify the best migration method for your data set. We will explore common migration tools available to migrate data from on-premises and other cloud providers to OCI or within OCI cloud storage services. Test results are shared from sample datasets so you can extrapolate how the tools may perform on your data set.
-
Rclone: Rclone is an open source, command-line utility to migrate data to the cloud, or between cloud storage vendors. Rclone can be used to do one-time migration as well as periodical synchronization between source and destination storage. Rclone can migrate data to and from object storage, file storage, mounted drives, and between 70 supported storage types. OCI Object Storage is natively supported as a Rclone backend provider. Rclone processes can be scaled up and scaled out to increase the transfer performance using parameter options. Rclone has options like copy and sync to transfer data one-time and periodical respectively. For more information, see Install Rclone.
-
Flexify IO: Flexify IO is a third party, easy to use migration tool focused on object storage. Flexify works with S3 compatible cloud (OCI, AWS, GCP, Azure, and more) and on-prem (Minio, Dell EMC ECS,and others) object storage. Simply login to Flexify management and configure credentials/ access keys, and set up the migration through a graphical user interface (GUI). Flexify IO’s horizontal scaling algorithm will automatically scale as fast as connectivity and the storage will allow. Flexify is also integrated with Network as a Service (NaaS) providers such as the OCI partner Megaport and can provision/ deprovision fast connections on-demand. For more information review migrate data between clouds and Megaport and Flexify IO solution.
-
Resilio Active Anywhere: Resilio is a third party, agent-based data transfer application. It is a rich graphic user interface (GUI) and gives great control on the transfer jobs as well as visualizing the performance metrics. The software is available in Oracle Cloud Marketplace and is licensed from Resilio for installation. Resilio can synchronize files in fixed timeframes in any direction in a one-to-many, many-to-one, or a many-to-many mesh enabling a global presence to a dataset. For more information see Data migration to and between OCI storage services using Resilio Connect and Resilio Active Anywhere.
-
Open Source Linux Sync Utilities: rsync and fpsync.
-
Rsync: Common Unix based tool to do one-time copy or periodical sync between source and destination paths.
-
Fpsync: Open source tool to do parallel sync. It uses rsync, tar or rclone underneath and runs a parallel wrapper on top. You can choose either of rsync, tar or rclone as the underlying sync tool.
-
-
Using File Storage Parallel Tools: OCI File Storage has delivered a parallel tools package that works optimally with the file system. It is available with Linux developer repository and can be installed directly from the yum repository. The package contains three tools that are parcp, parter and param which are parallel equivalent of standard Linux utilities cp, tar and rm respectively.
-
OCI Command Line Interface Object Storage Sync: OCI Object Storage Sync (os sync) is part of the OCI Command Line Interface (CLI) which synchronizes a filesystem directory with objects in a bucket. The command traverses sub-directories copying new and modified files or objects from the source to the destination and optionally deleting those that are not present in the source. It can run up to 1,00 parallel operations, depending on host machine resources.
-
OCI Object Storage Bulk Copy Python API: The following link is an example of bulk copy script using the python API for OCI. The API can be used to write scripts such as the example script which uses parallel threads to copy objects from one bucket into another. For more information on using the sample script, see Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations.
-
S3 Compatible tool (s5cmd): The s5cmd is an open source tool that can be used to migrate or synchronize local filesystems and NAS storage (on-prem and in OCI) into OCI Object Storage. It can also be used to migrate data bucket-to-bucket within the same region.
The Right Tool for the Job
There are many tools to choose from when doing a migration, and it can be overwhelming to research them all at once. Based on the origination, destination, direction of the migration, user experience, and user environment the best migration tool will vary.
The following table provides recommendations for common migration scenarios, including the migration of on-premises data, migration of data from another cloud vendor into OCI, copying OCI cloud storage data from one region to another, copying OCI cloud storage data within a region, and copying OCI File Storage data to OCI Object Storage.
| Migrate Data From | Migrate Data To | Recommended Tool(s) | Notes | Documentation/ Tutorial Links |
|---|---|---|---|---|
| On-prem filesystem | OCI Object Storage | 1. s5cmd (small/mixed files) 2. OCI Object Storage Sync (few large files) 3. Rclone (mixed) 4. Resilio Active Anywhere |
Use the tool best for your data structures and you feel comfortable using. FastConnect should be reviewed to improve transfer time. | S5cmd and Object Storage Sync Tutorial, Rclone Tutorial, Resilio Active Anywhere |
| Another Cloud Vendor Object or Blob Storage | OCI Object Storage | 1. Flexify IO 2. Rclone |
Use Flexify IO for S3 compatible vendors and when a supported GUI interface is desired. Use Rclone when you’re comfortable experimenting and toggling various settings and for the most compatible supporting over 70 different cloud vendors (S3 compatible and non-S3 compatible object storage). |
Flexify IO migrate between clouds, Rclone Tutorial |
| OCI Object Storage | OCI Object Storage in another region | 1. Object Replication 2. OCI Object Storage Bulk Copy Python API 3. Flexify IO 4. Rclone |
Using the native Object Replication is good for an exact replica of a new bucket that is currently empty, use OCI Object Storage Bulk Copy Python API, Flexify IO or Rclone to initialize copies of a source bucket which already has objects or when you want to preserve objects on the destination. | Object Storage Replication documentation, Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations, Flexify IO, Rclone Tutorial |
| OCI Object Storage | OCI Object Storage in another tenancy (same or different region) | 1. OCI Object Storage Bulk Copy Python API 2. Flexify IO 3. Rclone |
Pre-requisite for OCI Object Storage Bulk Copy Python API: Use cross tenancy IAM policies to enable cross-tenancy copies. For more information, see Accessing Object Storage Resources Across Tenancies. | Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations, Flexify IO, Rclone Tutorial |
| OCI File Storage | OCI Object Storage | 1. s5cmd (small/mixed files) 2. Object Storage Sync (few large files) 3. Rclone (mixed) 4. Resilio Active Anywhere |
Use the tool best for your data structures and you feel comfortable using. | S5cmd and Object Storage Sync Tutorial, Rclone Tutorial, Resilio Active Anywhere |
| On-prem filesystem | OCI File Storage | 1. fpsync (Linux) and CIFS + fpsync (Windows) 2. Resilio Active Anywhere |
Ensure that network connectivity is established between source and destination instances. | Fpsync documentation, fpsync tutorial, Resilio Active Anywhere |
| Another cloud vendor local disk or file storage | OCI File Storage | 1. fpsync (Linux) and CIFS + fpsync (Windows) 2. Resilio Active Anywhere |
Ensure that network connectivity is established between source and destination instances. FastConnect should be reviewed to improve transfer time. | Fpsync documentation, fpsync tutorial, Resilio Active Anywhere |
| OCI File Storage | OCI File Storage in another region | 1. File System Replication 2. fpsync with instance-to-instance streaming 3. Resilio Active Anywhere |
If you use replication, see replication’s Limitations and Considerations or if you use instance-to-instance streaming, ensure that network connectivity is established between source and destination instances. | Fpsync documentation, fpsync tutorial, File System Replication documentation, Resilio Active Anywhere |
| OCI File Storage | OCI File Storage within the same availability domain | 1. File System Replication 2. Using File Storage Parallel Tools: parcp 3. Resilio Active Anywhere |
If you use replication, see replication’s Limitations and Considerations or if you use parcp, ensure that both source and destination file systems are mounted in the instance. | File System Replication, Using File Storage Parallel Tools: parcp, Resilio Active Anywhere |
| On-premises, Another cloud vendor | OCI Object Storage or OCI File Storage | Resilio Active Anywhere Platform | Use Resilio Active Anywhere platform when you need multi-way synchronization of data, white glove service with support, and GUI interface. Resilio has been verified by the OCI cloud storage service product team and is available in the Oracle Cloud Marketplace. | For more information on using their platform reach out to the Resilio team. |
Note: The migration tool series will not cover OCI Object Storage or OCI File System Replication, OCI Object Storage Bulk Copy Python API, Flexify, and Resilio. See Related Links for more information.
Next Steps
Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:
-
Using Rclone, see Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone.
-
Using OCI Object Storage Sync and S5cmd, see Tutorial 3: Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and S5cmd.:w
-
Using Fpsync and Rsync for file system data migrations, see Tutorial 4: Move Data into OCI Cloud Storage Services using Fpsync and Rsync for File System Data Migrations.
(Optional) Test Environments
Recommendations are made based on testing and customer interactions.
Test Environment 1:
1 VM instance VM.Standard.E4.Flex, 1 OCPU, 1Gbps network bandwidth, 16GB of memory. To simulate on-premises to OCI migration copied data from PHX NFS to IAD.
Data Sets
- Data Set 1: Migrating data from NFS mounted filesystem to OCI Object storage.
Total data set size: 3TB, with 3 files, each file 1TB.
| Method | To-From | Time | Command | Flags |
|---|---|---|---|---|
| os sync | NFS/File PHX to Object IAD | 123m17.102s | NA | --parallel-operations-count 100 |
| s5cmd | NFS/File PHX to Object IAD | 239m20.625s | copy | run commands.txt, default run --numworkers 256 |
| rclone | NFS/File PHX to Object IAD | 178m27.101s | copy | --transfers=100 --oos-no-check-bucket --fast-list --checkers 64 --retries 2 --no-check-dest |
Note: Our tests showed
os syncrunning the fastest for this data set.
- Data set 2: Migrating data from NFS mounted filesystem to OCI Object Storage
Total data set size: 9.787GB, with 20,000 files, each file 20MB
| Method | To-From | Time | Command | Flags |
|---|---|---|---|---|
| s5cmd | NFS/File PHX to Object IAD | 1m12.746s | copy | default run --numworkers 256 |
| os sync | NFS/File PHX to Object IAD | 2m48.742s | NA | --parallel-operations-count 1000 |
| rclone | NFS/File PHX to Object IAD | 1m52.886s | copy | --transfers=500 --oos-no-check-bucket --no-check-dest |
Note: Our tests showed
s5cmdperforming the best for this data set.
Test Environment 2:
VM Instances: 2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing. Flexify IO does not use VM instances.
Data sets used in testing: 14 main directories with the following file count and sizes, totaling 2.25TiB.
| Data Set Directory | Size | File count | Size of Each File |
|---|---|---|---|
| Directory 1 | 107.658 GiB | 110,242 | 1 MiB |
| Directory 2 | 1.687 GiB | 110,569 | 15 MiB |
| Directory 3 | 222 GiB | 111 | 2 GiB |
| Directory 4 | 1.265 TiB | 1,295 | 1 GiB |
| Directory 5 | 26.359 GiB | 1,687 | 16 MiB |
| Directory 6 | 105.281 MiB | 26,952 | 4 KiB |
| Directory 7 | 29.697 MiB | 30,410 | 1 KiB |
| Directory 8 | 83.124 GiB | 340,488 | 256 KiB |
| Directory 9 | 21.662 GiB | 354,909 | 64 KiB |
| Directory 10 | 142.629 GiB | 36,514 | 4 MiB |
| Directory 11 | 452.328 MiB | 57,898 | 8 MiB |
| Directory 12 | 144 GiB | 72 | 2GiB |
| Directory 13 | 208.500 GiB | 834 | 256 MiB |
| Directory 14 | 54.688 GiB | 875 | 64 MiB |
Note:
- The 14 directories were split between the 2 VM instances where applicable.
- Each VM ran 7 commands/processes, 1 for each directory unless otherwise noted.
| Method | To-From | Time | Command | Flags/ Notes |
|---|---|---|---|---|
| s5cmd | NFS/File PHX to Object IAD | 54m41.814s | copy | --numworkers 74 |
| os sync | NFS/File PHX to Object IAD | 65m43.200s | NA | --parallel-operations-count 50 |
| rclone | NFS/File PHX to Object IAD | 111m59.704s | copy | --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 50 |
| rclone | Object PHX to Object IAD | 28m55.663s | copy | --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 400, same command run across 2 VM’s for a concurrency of 800 transfers |
| python bulk copy script | Object PHX to Object IAD | 25m43.715s | Default | 1 VM, 50 workers, 100,000 files queued at a time |
| Flexify IO | Object PHX to Object IAD | 20m27s | copy | Defaults to 10 engines/ slots |
| Flexify IO | Object PHX to Object IAD | 16m12s | copy | 20 engines/ slots, this can be raised via “Advanced Settings” |
The s5cmd and os sync commands do well over filesystem/NFS to object storage. Flexify IO and bulk copy script only focusses on object storage ( bucket-to-bucket) transfers and was not tested for NFS migration.
Only Flexify IO,rclone, and the python bulk copy script are capable of doing bucket-to-bucket transfers across regions so the other tools were not tested for it. Flexify IO performs the best for object storage migrations across regions, with The python bulk copy script perfoming better than rclone. It is important to note that Flexify IO works for S3 compatible object storage, the python bulk copy script only works with OCI Object storage, and rclone supports many backends and cloud providers.
Small test runs were conducted using rclone to transfer data from Microsoft Azure Blob Storage, Amazon Simple Storage Service (Amazon S3), and Google Cloud Platform Cloud Storage to OCI Object Storage to verify the tool works for these types of transfers. For more information, see Move data to object storage in the cloud by using Rclone.
FlexifyIO was used to migrate this data set from AWS us-east-2 to the OCI Ashburn region and took only 23m51s for the 2.25TiB using the default of 10 engines/ slots, additional engines/ slots could be added for faster performance.
Test Environment 3:
VM Instances: 1-2 VM instances were used for each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing. All tests were bucket-to-bucket. Flexify IO does not use VM instances.
| Total Size | File Count | File Size Range |
|---|---|---|
| 7.74 TiB | 1,000,000 | 30 MiB |
| Method | To-From | Time | Command | Flags | Notes | |
|---|---|---|---|---|---|---|
| rclone | Object-to-Object IAD -> IAD | 18h39m11.4s | copy | --oos-no-check-bucket --fast-list --no-traverse --transfers 500 --oos-chunk-size 10Mi |
1 VM, very slow due to the high file count and listing calls to source | |
| rclone | Object-to-Object IAD -> IAD | 55m8.431s | copy | --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> |
2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time, prevents listing on source and destination and improves performance | |
| python bulk copy script | Object-to-Object IAD -> IAD | 28m21.013s | NA | Default | 1 VM, 50 workers, 100,000 files queued at a time | |
| python bulk copy script | Object-to-Object IAD -> IAD | NA | NA | Default | 2 VMs, 50 workers per VM, 100,000 files queued at a time. Received 429 errors, script hung and could not complete | |
| Flexify IO | Object-to-Object IAD -> IAD | 39m19s | copy | Default | Defaults to 10 engines/slots | |
| Flexify IO | Object-to-Object IAD -> IAD | 21m37s | copy | 20 engines/ slots | Set to 20 engines/ slots, this can be raised via “Advanced Settings” | |
| s5cmd | Object-to-Object IAD -> IAD | 14m10.864s | copy | Defaults (256 workers) | 1 VM | NA |
| s5cmd | Object-to-Object IAD -> IAD | 7m50.013s | copy | Defaults | 2 VM’s, 256 workers each VM | Ran in abuot half the time as 1 VM |
| s5cmd | Object-to-Object IAD -> IAD | 3m23.382s | copy | --numworkers 1000 |
1 VM, 1000 workers | Across multiple tests we found this was the optimal run for this data set with the s5cmd |
| rclone | Object-to-Object IAD -> PHX | 184m36.536s | copy | --oos-no-check-bucket --no-traverse --transfers 500 --oos-chunk-size 10Mi --files-from <file> |
2 VM’s, 500 transfers per VM, object/file list fed 1,000 files at a time | |
| python bulk copy script | Object-to-Object IAD -> PHX | 35m31.633s | NA | Default | 1VM, 50 workers, 100,000 files queued at a time | |
| Flexify IO | Object-to-Object IAD -> PHX | 21m17s | copy | 20 engines/ slots | Set to 20 engines/ slots, this can be raised via “Advanced Settings” |
The s5cmd command ran consistently best for the large file count and small files. The s5cmd is limited because it can only do bucket-to-bucket copies within the same tenancy and same region.
Flexify IO would be the recommended tool for this migration data set since it performs well and supports various S3 compatible object storage types. Migration time went down after raising engine/ slot count for Flexify IO.
Notice high improvements to rclone once files are fed to the command and from scaling out to another VM. Rclone may run slower than other tools,however it is the most versatile in the various platforms it supports and types of migrations it can perform.
The OCI Object Storage Bulk Copy Python API can only use the OCI Native CopyObject API and can only get up to a concurrency of 50 workers before being throttled, it generally performs pretty well for this data set.
Tests for IAD to PHX were only done on what worked best in IAD to IAD and problematic tests were not re-run. The s5cmd was not run for IAD to PHX because it can only do bucket-to-buckets copies within the same region.
Related Links
-
Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone
-
Tutorial 3: Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and S5cmd
-
Announcing native OCI Object Storage provider backend support in rclone
-
Data migration to and between OCI storage services using Resilio Connect
-
Use Oracle Cloud Infrastructure Object Storage Python Utilities for Bulk Operations
Acknowledgments
-
Authors - Vinoth Krishnamurthy (Principal Member of Technical Staff, OCI File Storage), Melinda Centeno (Senior Principal Product Manager, OCI Object Storage)
-
Contributors - Aboo Valappil (Consulting Member of Technical Staff, OCI File and Block Storage), Ashutosh Mate (Senior Principal Product Manager, OCI Object Storage)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Use Migration Tools to Move Data into OCI Cloud Storage Services
G25415-02