Note:

Move Data into OCI Cloud Storage Services using OCI Object Storage Sync and the s5cmd Tool

Introduction

This is tutorial 3 of a four tutorial series that shows you various ways to migrate data into Oracle Cloud Infrastructure (OCI) cloud storage services. The series is set up so you can review Tutorial 1: Use Migration Tools to Move Data into OCI Cloud Storage Services to get a broad understanding of the various tools and then proceed to the related tutorial(s) or documents relevant to your migration needs. This tutorial will focus on using OCI Object Storage Sync (os sync) and the s5cmd to migrate filesystem data (including OCI File Storage) to and from OCI Object Storage.

OCI provides customers with high-performance computing and low-cost cloud storage options. Through on-demand local, object, file, block, and archive storage, Oracle addresses key storage workload requirements and use cases.

OCI cloud storage services offer fast, secure, and durable cloud storage options for all your enterprise needs. Starting with the high performance options such as OCI File Storage with Lustre and OCI Block Volumes service; fully managed exabyte scale filesystems from OCI File Storage service with high performance mount targets; to highly durable and scalable OCI Object Storage. Our solutions can meet your demands, ranging from performance intensive applications such as AI/ML workloads, to exabyte-scale data lakes.

Determine the amount of data that needs to be migrated, and the downtime available to cut-over to the new OCI storage platform. Batch migrations are a good choice to break down the migration into manageable increments. Batch migrations will enable you to schedule downtime for specific applications across different windows. Some customers have the flexibility to do a one-time migration over a scheduled maintenance window over 2-4 days. OCI FastConnect can be used to create a dedicated, private connection between OCI and your environment, with port speeds from 1G to 400G to speed up the data transfer process. OCI FastConnect can be integrated with partner solutions such as Megaport and ConsoleConnect to create a private connection to your data center or cloud-to-cloud interconnection to move data more directly from another cloud vendor into OCI cloud storage service. For more information, see FastConnect integration with Megaport Cloud Router.

Audience

DevOps engineers, developers, OCI cloud storage administrators and users, IT managers, OCI power users, and application administrators.

Objective

Learn how to copy and/or synchronize file system data to/from OCI Object Storage using the OCI CLI with os sync and the s5cmd tool.

Prerequisites

Synchronize Network File System and Local File System Data to/from OCI Object Storage

OCI Object Storage Sync is part of the OCI Command Line Interface (CLI) which synchronizes a filesystem directory with objects in a bucket. The command traverses sub-directories copying new and modified files or objects from the source to the destination and optionally deleting those that are not present in the source. It is a convenient tool to keep file system data and OCI Object Storage buckets synchronized. Our test environment found OCI Object Storage Sync performed well for mixed data sets and better than other tools for large files (1TB or more).

OCI Object Storage supports an Amazon S3 Compatibility API. Customers who are already familiar with the variety of Amazon S3 tools can continue to use them. The s5cmd tool is a free, open source project. It enables browsing and transferring data to/from an S3 compatible object store. It is written in the Go language and is optimized for parallel throughput. During our testing, we found the s5cmd tool works best for small files (files smaller than 1MB up to 30MB) and out performed all other tools for moving small files. The s5cmd tool also works well for mixed data sets when moving data from file systems to OCI Object Storage.

If your data can be organized into subsets by directories or prefixes, you can also scale out your os sync and s5cmd runs across multiple VM’s to improve transfer times.

Use OS Sync to Synchronize

  1. Synchronize local file system data into OCI Object Storage.

    Run the following basic os sync command to synchronize the files from a source directory into a destination bucket.

    oci os object sync --src-dir <path to migration-files> --bucket-name <bucket name>
    

    Note: This same command can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage bucket.

  2. Migrate data from OCI Object Storage to a local file system.

    Run the following basic os sync command to synchronize files/objects from a source bucket in a destination file system.

    oci os object sync --dest-dir <path to migration-target directory> --bucket-name <bucket name>
    

    Note: This same command can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with an OCI File Storage NFS mount to move the data from OCI File Storage service into an OCI Object Storage bucket.

  3. Increase parallel operations.

    By default only 10 operations are run in parallel, increasing parallel operations improves data transfer speeds and also consumes more system resources and bandwidth. In our testing environment, we used the VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory and found 100 parallel operations to work best for transfer speeds. Larger VM’s can bump this number up 10 operations at a time until an optimal transfer speed is achieved or until the maximum of 1,000 parallel transfers are reached. Smaller VM shapes should start at 10 operations and increase in increments of 5-10 until good throughput is achieved. Increase or decrease parallel operations with the following flag.

    --parallel-operations-count <integer range>
    

    Note: Should errors start occurring after increasing the parallel operation, especially errors with 429 “TooManyRequests”, lower the parallel operations by 2 until the errors stop.

  4. Filter the matched files using pattern command.

    Patterns can be used to include or exclude matched files. Pattern commands can be used multiple times on the command line to match multiple patterns.

    • To include files that match a pattern use the following command.

      --include
      
    • To exclude files that match a pattern use the following command.

      --exclude
      

    Note:

    • *: Matches everything
    • ?: Matches any single character
    • [sequence]: Matches any character in sequence
    • [!sequence]: Matches any character not in sequence
  5. Use the --prefix flag.

    The --prefix flag used when uploading files to object storage with the --src-dir command uploads objects and adds the directory path to the object name as a prefix. When used for downloading objects from OCI Object Storage, only objects with the specified prefix are downloaded and do not show up as part of the object/file name.

  6. Verify a transfer before a run.

    Before starting a transfer, you can determine which files will be uploaded/downloaded to/from OCI Object Storage by having os sync do a print out only. Run command line with the following flag.

    --dry-run
    

Use s5cmd to Synchronize

  1. Sync and copy command.

    • The sync option for s5cmd makes a one way synchronization from source to destination without modifying any of the source files and also will not delete files on the destination that do not exist on the source. Add the --delete flag to remove files on the destination that do not exist on the source.

    • The copy command will simply copy objects from the source to the destination.

  2. Pattern matching.

    The s5cmd tool supports multiple-level wildcards for the sync and copy operations. This is achieved by listing all objects with the prefix up to the first wildcard, then filtering the results in-memory.

    When the source is a file system, wildcards also apply. When using the * character, it is sometimes interpreted as a globbing wildcard, wrap it in single quotes to avoid unexpected results.

    For example, to copy all gzipped files in a directory into a bucket run the following command.

    s5cmd cp '*.gz' s3:/<bucket name>
    
  3. Basic sync command.

    • Synchronize local file system data into OCI Object Storage. Run the following basic s5cmd command to synchronize the files from a source directory into a destination bucket.

      s5cmd sync /<path to migration-files> s3://<bucket name>
      
    • Synchronize data from OCI Object Storage to a local file system. Run the following basic s5cmd command to synchronize files/objects from a source bucket in a destination file system.

      s5cmd sync s3:/<bucket name>/* /<path to migration-target directory>
      

    Note: The same commands can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI Compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage bucket.

  4. Basic copy command.

    Run the following basic s5cmd command to copy the files from a local file system source directory into an OCI Object Storage destination bucket.

    s5cmd cp /<path to migration-files> s3://<bucket name>
    

    Copy data from OCI Object Storage to a local file system.

    s5cmd cp "s3:/<bucket name>/*" /<path to migration-target directory>
    

    Note: These commands can be used for on-premises local file systems, on-premises NFS file systems, and on an OCI compute instance with OCI File Storage NFS mounts to move data from OCI File Storage into an OCI Object Storage Bucket. Our testing primarily used the copy command with s5cmd.

  5. Increase parallelism.

    The s5cmd tool runs 256 workers in parallel by default. Depending on the size of your VM, you may want to increase or decrease the parallelism. In our testing environment, we used the VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory and found 1,000 parallel operations to work best for transfer speeds. Larger VM’s can bump this number up 10 operations at a time until an optimal transfer speed is achieved or until the maximum of 1,000 parallel transfers are reached. Smaller VM shapes should start at 10 operations and increase in increments of 5-10 until good throughput is achieved. Increase or decrease parallel operations with the following flag.

    --numworkers <interger>
    

    Note: Should errors start occurring after increasing the number of workers, especially errors with 429 “TooManyRequests”, lower the number of workers by 2 until the errors stop occurring.

    For example, copy all objects in a bucket into a local file system directory.

    s5cmd --numworkers 1000 cp "s3://MyBucket/*" /my/directory
    

    Note: We tested with up 1,500 workers and did not see any significant improvements. Since our testing found that the s5cmd tool ran best for small files, we did not find any benefits to using the concurrency flag for files needing multi-part uploads and using a high number of workers showed the best overall performance.

(Optional) Test Environments

Recommendations are made based on testing and customer interactions.

Note: Rclone results are included to give more information, for details on using Rclone, see Tutorial 2: Move Data into OCI Cloud Storage Services using Rclone.

Test Environment 1:

1 VM instance VM.Standard.E4.Flex, 1 OCPU, 1Gbps network bandwidth, 16GB of memory. To simulate on-premises to OCI migration copied data from PHX NFS to IAD.

Data Sets

Test Environment 2:

VM Instances: 1-2 VM instances were used per each test, we used a VM.Standard.E4.Flex with 24 OCPU, 24Gbps network bandwidth, 384GB of memory. Oracle Linux 8 was used for Linux testing.

Data Sets

Next Steps

Proceed to the related tutorial(s) relevant to your migration needs. To move data into OCI cloud storage services:

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.