Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and the s5cmd Tool
Introduction
This is tutorial 3 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is set up so you can review Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs. This tutorial will focus on using OCI Object Storage Sync (os sync) and the s5cmd to migrate filesystem data (including OCI File Storage) to and from OCI Object Storage.
OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.
OCI cloud storage services offer fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.
-
OCI Object Storage Sync (os sync) is part of the Oracle Cloud Infrastructure Command Line Interface (OCI CLI) which synchronizes a filesystem directory with objects in a bucket. The command traverses sub-directories copying new and modified files or objects from the source to the destination and optionally deleting those that are not present in the source.
-
The s5cmd tool is a free, open source project. s5cmd enables browsing and transferring data to/from S3 compatible object stores (including OCI Object Storage) to/from file system data (including OCI File Storage). It is written in the Go language.
Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.
Audience
DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.
Objective
Learn how to copy and/or synchronize file system data to/from OCI Object Storage using the OCI CLI with os sync and the s5cmd tool.
-
Use the
os sync
command with various parameters and options. -
Various ways to run the s5cmd tool for data migration and synchronization.
Prerequisites
-
An OCI account.
-
VM instance on OCI to deploy the migration tools or a system where you can deploy and use migration tools.
-
OCI CLI installed with a working config file in your home directory in a subdirectory called
.oci
. For more information, see Setting up the Configuration File. -
Access to an OCI Object Storage bucket.
-
User permissions in OCI to use OCI Object Storage, have access to manage buckets and objects or manage object-family for at least 1 bucket or compartment. For more information, see Common Policies and Policy Reference.
-
User permission to create, export, and mount OCI File Storage, or access to an OCI File Storage mount target that is already mounted on a VM, or another Network File System (NFS) mount or local file system to use to copy data to and from. For more information, see Manage File Storage Policy.
-
Familiarity with using a terminal or shell interface on Mac OS, Linux, Berkeley Software Distribution (BSD), or on Windows PowerShell, command prompt, or bash.
-
Review Migration Essentials for Moving Data into OCI Cloud Storage to install the OCI CLI with os sync and the s5cmd tool.
-
To know the migration tools we can use, see Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services.
Synchronize Network File System and Local File System Data to/from OCI Object Storage
OCI Object Storage Sync is part of the OCI Command Line Interface (CLI) which synchronizes a filesystem directory with objects in a bucket. The command traverses sub-directories copying new and modified files or objects from the source to the destination and optionally deleting those that are not present in the source. It is a convenient tool to keep file system data and OCI Object Storage buckets synchronized. Our test environment found OCI Object Storage Sync performed well for mixed data sets and better than other tools for large files (1TB or more).
OCI Object Storage supports an Amazon S3 Compatibility API. Customers who are already familiar with the variety of Amazon S3 tools can continue to use them. The s5cmd tool is a free, open source project. It enables browsing and transferring data to/from an S3 compatible object store. It is written in the Go language and is optimized for parallel throughput. During our testing, we found the s5cmd tool works best for small files (files smaller than 1MB up to 30MB) and out performed all other tools for moving small files. The s5cmd tool also works well for mixed data sets when moving data from file systems to OCI Object Storage.
If your data can be organized into subsets by directories or prefixes, you can also scale out your os sync and s5cmd runs across multiple VM’s to improve transfer times.
Use OS Sync to Synchronize
-
Synchronize local file system data into OCI Object Storage.
Run the following basic
os sync
command to synchronize the files from a source directory into a destination bucket.oci os object sync --src-dir <path to migration-files> --bucket-name <bucket name>
Note: This same command can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage bucket.
-
Migrate data from OCI Object Storage to a local file system.
Run the following basic
os sync
command to synchronize files/objects from a source bucket in a destination file system.oci os object sync --dest-dir <path to migration-target directory> --bucket-name <bucket name>
Note: This same command can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with an OCI File Storage NFS mount to move the data from OCI File Storage service into an OCI Object Storage bucket.
-
Increase parallel operations.
By default only 10 operations are run in parallel, increasing parallel operations improves data transfer speeds and also consumes more system resources and bandwidth. In our testing environment, we used the
VM.Standard.E4.Flex
with 24 OCPU, 24Gbps network bandwidth, 384GB of memory and found 100 parallel operations to work best for transfer speeds. Larger VM’s can bump this number up 10 operations at a time until an optimal transfer speed is achieved or until the maximum of 1,000 parallel transfers are reached. Smaller VM shapes should start at 10 operations and increase in increments of 5-10 until good throughput is achieved. Increase or decrease parallel operations with the following flag.--parallel-operations-count <integer range>
Note: Should errors start occurring after increasing the parallel operation, especially errors with 429 “TooManyRequests”, lower the parallel operations by 2 until the errors stop.
-
Filter the matched files using pattern command.
Patterns can be used to include or exclude matched files. Pattern commands can be used multiple times on the command line to match multiple patterns.
-
To include files that match a pattern use the following command.
--include
-
To exclude files that match a pattern use the following command.
--exclude
Note:
*
: Matches everything?
: Matches any single character[sequence]
: Matches any character in sequence[!sequence]
: Matches any character not in sequence
-
-
Use the
--prefix
flag.The
--prefix
flag used when uploading files to object storage with the--src-dir
command uploads objects and adds the directory path to the object name as a prefix. When used for downloading objects from OCI Object Storage, only objects with the specified prefix are downloaded and do not show up as part of the object/file name. -
Verify a transfer before a run.
Before starting a transfer, you can determine which files will be uploaded/downloaded to/from OCI Object Storage by having os sync do a print out only. Run command line with the following flag.
--dry-run
Use s5cmd to Synchronize
-
Sync and copy command.
-
The
sync
option for s5cmd makes a one way synchronization from source to destination without modifying any of the source files and also will not delete files on the destination that do not exist on the source. Add the--delete
flag to remove files on the destination that do not exist on the source. -
The
copy
command will simply copy objects from the source to the destination.
-
-
Pattern matching.
The s5cmd tool supports multiple-level wildcards for the sync and copy operations. This is achieved by listing all objects with the prefix up to the first wildcard, then filtering the results in-memory.
When the source is a file system, wildcards also apply. When using the
*
character, it is sometimes interpreted as a globbing wildcard, wrap it in single quotes to avoid unexpected results.For example, to copy all gzipped files in a directory into a bucket run the following command.
s5cmd cp '*.gz' s3:/<bucket name>
-
Basic sync command.
-
Synchronize local file system data into OCI Object Storage. Run the following basic
s5cmd
command to synchronize the files from a source directory into a destination bucket.s5cmd sync /<path to migration-files> s3://<bucket name>
-
Synchronize data from OCI Object Storage to a local file system. Run the following basic
s5cmd
command to synchronize files/objects from a source bucket in a destination file system.s5cmd sync s3:/<bucket name>/* /<path to migration-target directory>
Note: The same commands can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage bucket.
-
-
Basic copy command.
Run the following basic
s5cmd
command to copy the files from a local file system source directory into an OCI Object Storage destination bucket.s5cmd cp /<path to migration-files> s3://<bucket name>
Copy data from OCI Object Storage to a local file system.
s5cmd cp "s3:/<bucket name>/*" /<path to migration-target directory>
Note: These commands can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage Bucket. Our testing primarily used the copy command with s5cmd.
-
Increase parallelism.
The s5cmd tool runs 256 workers in parallel by default. Depending on the size of your VM, you may want to increase or decrease the parallelism. In our testing environment, we used the
VM.Standard.E4.Flex
with 24 OCPU, 24Gbps network bandwidth, 384GB of memory and found 1,000 parallel operations to work best for transfer speeds. Larger VM’s can bump this number up 10 operations at a time until an optimal transfer speed is achieved or until the maximum of 1,000 parallel transfers are reached. Smaller VM shapes should start at 10 operations and increase in increments of 5-10 until good throughput is achieved. Increase or decrease parallel operations with the following flag.--numworkers <interger>
Note: Should errors start occurring after increasing the number of workers, especially errors with 429 “TooManyRequests”, lower the number of workers by 2 until the errors stop occurring.
For example, copy all objects in a bucket into a local file system directory.
s5cmd --numworkers 1000 cp "s3://MyBucket/*" /my/directory
Note: We tested with up 1,500 workers and did not see any significant improvements. Since our testing found that the s5cmd tool ran best for small files, we did not find any benefits to using the
concurrency
flag for files needing multi-part uploads and using a high number of workers showed the best overall performance.
(Optional) Test Environments
Recommendations are made based on testing and customer interactions.
Note: Rclone results are included to give more information, for details on using Rclone, see Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone.
Test Environment 1:
1 VM instance VM.Standard.E4.Flex
, 1 OCPU, 1Gbps network bandwidth, 16GB of memory. To simulate on-premises to OCI migration copied data from PHX NFS to IAD.
Data Sets
-
Data Set 1:
Total Size File Count File Size Range 3TB 3 1TB Method To-From Time Command Flags os sync NFS/File PHX to Object IAD 123m17.102s NA --parallel-operations-count 100
s5cmd NFS/File PHX to Object IAD 239m20.625s copy run commands.txt
, default run--numworkers 256
rclone NFS/File PHX to Object IAD 178m27.101s copy --transfers=100 --oos-no-check-bucket --fast-list --checkers 64 --retries 2 --no-check-dest
Note: Our tests showed
os sync
running the fastest for this data set. -
Data Set 2:
Total Size File Count File Size Range 9.787GB 20,000 1MB Method To-From Time Command Flags s5cmd NFS/File PHX to Object IAD 1m12.746s copy default run --numworkers 256
os sync NFS/File PHX to Object IAD 2m48.742s NA --parallel-operations-count 1000
rclone NFS/File PHX to Object IAD 1m52.886s copy --transfers=500 --oos-no-check-bucket --no-check-dest
Note: Our tests showed s5cmd performing the best for this data set.
Test Environment 2:
VM Instances: 1-2 VM instances were used per each test, we used a VM.Standard.E4.Flex
with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing.
Data Sets
-
Data Set 1:
14 main directories with the following file count and sizes.
Data Set Directory Size File count Size of Each File Directory 1 107.658 GiB 110,242 1 MiB Directory 2 1.687 GiB 110,569 15 MiB Directory 3 222 GiB 111 2 GiB Directory 4 1.265 TiB 1,295 1 GiB Directory 5 26.359 GiB 1,687 16 MiB Directory 6 105.281 MiB 26,952 4 KiB Directory 7 29.697 MiB 30,410 1 KiB Directory 8 83.124 GiB 340,488 256 KiB Directory 9 21.662 GiB 354,909 64 KiB Directory 10 142.629 GiB 36,514 4 MiB Directory 11 452.328 MiB 57,898 8 MiB Directory 12 144 GiB 72 2GiB Directory 13 208.500 GiB 834 256 MiB Directory 14 54.688 GiB 875 64 MiB Note:
- The 14 directories were split between the 2 VM instances.
- Each VM ran 7 commands/processes, 1 for each directory unless otherwise noted.
Method To-From Time Command Flags/ Notes s5cmd NFS/File PHX to Object IAD 54m41.814s copy --numworkers 74
os sync NFS/File PHX to Object IAD 65m43.200s NA --parallel-operations-count 50
rclone NFS/File PHX to Object IAD 111m59.704s copy --oos-no-check-bucket --no-check-dest --ignore-checksum --oos-disable-checksum --transfers 50
Note: Our tests showed s5cmd running the fastest, with os sync doing pretty well compared to Rclone.
Next Steps
Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:
-
Using Rclone, see Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone.
-
Using Fpsync and Rsync for file system data migrations, see Tutorial 4: Move Data into OCI Cloud Storage Services using Fpsync and Rsync for File System Data Migrations.
Related Links
-
Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services
-
Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone
Acknowledgments
- Author - Melinda Centeno (Senior Principal Product Manager, OCI Object Storage)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and the s5cmd Tool
G25613-01
January 2025
Copyright ©2025, Oracle and/or its affiliates.