Copying Data With odcp

Use the Oracle Big Data Cloud Service distributed-copy utility odcp at the command line to copy data between HDFS on your cluster and various other supported storage providers.

See odcp Reference for the odcp syntax, parameters, and options.

Prerequisites

For all the operations shown on this page, every cluster node must have
  • Access to all running storage services

  • All required credentials established, for example Oracle Storage Cloud Service accounts

Operations Allowed When Using odcp to Copy Data, by Storage Type

You can use odcp to copy data between a number of different kinds of storage. The operations available for each possible combination are shown below.

For examples of each scenario, see odcp Supported Storage Sources and Targets.

The following table summarizes what you can do with odcp when copying data between various types of storage. The first two columns show the scenario (from/to) and the remaining columns show the operations.

When Transferring Data From... To... Copy Filter and Copy Filter, Copy, and Group Move Sync Retry
HDFS HDFS yes yes yes no yes yes
HDFS WebHDFS yes yes yes no no no
HDFS Secure WebHDFS yes yes yes no no no
HDFS Oracle Cloud Infrastructure Object Storage Classic yes yes no no yes yes
HDFS Amazon Simple Storage Service (S3) yes yes no no no no
Oracle Cloud Infrastructure Object Storage Classic Oracle Cloud Infrastructure Object Storage Classic yes yes no no yes yes
Oracle Cloud Infrastructure Object Storage Classic HDFS yes yes yes no yes yes
Oracle Cloud Infrastructure Object Storage Classic WebHDFS yes yes yes no no no
Oracle Cloud Infrastructure Object Storage Classic Secure WebHDFS yes yes yes no no no
Oracle Cloud Infrastructure Object Storage Classic Amazon Simple Storage Service (S3) yes yes no no no no
Oracle Cloud Infrastructure Object Storage Classic Oracle Cloud Infrastructure Object Storage yes yes no no no no
WebHDFS WebHDFS yes yes yes no no no
WebHDFS HDFS yes yes yes no no no
WebHDFS HDFS yes yes yes no no no
WebHDFS Oracle Cloud Infrastructure Object Storage Classic yes yes no no no no
WebHDFS Amazon Simple Storage Service (S3) yes yes no no no no
WebHDFS Oracle Cloud Infrastructure Object Storage yes yes no no no no
Secure WebHDFS WebHDFS yes yes yes no no no
Secure WebHDFS HDFS yes yes yes no no no
Secure WebHDFS Oracle Cloud Infrastructure Object Storage Classic yes yes no no no no
Secure WebHDFS Amazon Simple Storage Service (S3) yes yes no no no no
Secure WebHDFS Oracle Cloud Infrastructure Object Storage yes yes no no no no
Amazon Simple Storage Service (S3) HDFS yes yes no no no no
Amazon Simple Storage Service (S3) WebHDFS yes yes no no no no
Amazon Simple Storage Service (S3) Secure WebHDFS yes yes no no no no
Amazon Simple Storage Service (S3) Oracle Cloud Infrastructure Object Storage Classic yes yes no no no no
Amazon Simple Storage Service (S3) Amazon Simple Storage Service (S3) yes yes no no no no
Amazon Simple Storage Service (S3) Oracle Cloud Infrastructure Object Storage yes yes no no no no
Oracle Cloud Infrastructure Object Storage HDFS yes yes no no no no
Oracle Cloud Infrastructure Object Storage WebHDFS yes yes no no no no
Oracle Cloud Infrastructure Object Storage Secure WebHDFS yes yes no no no no
Oracle Cloud Infrastructure Object Storage Oracle Cloud Infrastructure Object Storage Classic yes yes no no no no
Oracle Cloud Infrastructure Object Storage Amazon Simple Storage Service (S3) yes yes no no no no
Oracle Cloud Infrastructure Object Storage Oracle Cloud Infrastructure Object Storage yes yes no no no no
HTTP HDFS yes yes yes no no no
HTTP WebHDFS yes yes yes no no no
HTTP Secure WebHDFS yes yes yes no no no
HTTP Oracle Cloud Infrastructure Object Storage Classic yes yes no no no no
HTTP Oracle Cloud Infrastructure Object Storage yes yes no no no no
HTTP Amazon Simple Storage Service (S3) yes yes no no no no

odcp Supported Storage Sources and Targets

The following examples show different scenarios for copying data between the various storage systems and services supported by odcp.

Copy From... To... Example

HDFS

HDFS

[oracle@cfclbv2491 ~]$ odcp hdfs:///user/example/bigdata.file hdfs:///user/example/bigdata.file.copy

HDFS

WebHDFS

[oracle@cfclbv2491 ~]$ odcp hdfs:///user/example/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

HDFS

Secure WebHDFS

[oracle@cfclbv2491 ~]$ odcp hdfs:///user/example/bigdata.file swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy

HDFS

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp hdfs:///user/example/bigdata.file swift://aserver.a424392/bigdata.file.copy

HDFS

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp hdfs:///user/example/bigdata.file s3a://aserver/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file swift://aserver.a424392/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

HDFS

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file hdfs:///user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

WebHDFS

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

Secure WebHDFS

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file s3a://aserver/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp swift://aserver.a424392/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

WebHDFS

WebHDFS

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

WebHDFS

HDFS

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file hdfs:///user/example/bigdata.file.copy

WebHDFS

HDFS

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file hdfs:///user/example/bigdata.file.copy

WebHDFS

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file swift://aserver.a424392/bigdata.file.copy

WebHDFS

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file s3a://aserver/bigdata.file.copy

WebHDFS

Oracle Cloud Infrastructure Object Storage

[oracle@cfclbv2491 ~]$ odcp webhdfs://webhdfs-host:50070/user/example/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

Secure WebHDFS

WebHDFS

[oracle@cfclbv2491 ~]$ odcp swebhdfs://webhdfs-host:50470/user/example/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

Secure WebHDFS

HDFS

[oracle@cfclbv2491 ~]$ odcp swebhdfs://webhdfs-host:50470/user/example/bigdata.file hdfs:///user/example/bigdata.file.copy

Secure WebHDFS

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp swebhdfs://webhdfs-host:50470/user/example/bigdata.file swift://aserver.a424392/bigdata.file.copy

Secure WebHDFS

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp swebhdfs://webhdfs-host:50470/user/example/bigdata.file s3a://aserver/bigdata.file.copy

Secure WebHDFS

Oracle Cloud Infrastructure Object Storage

[oracle@cfclbv2491 ~]$ odcp swebhdfs://webhdfs-host:50070/user/example/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

Amazon Simple Storage Service (S3)

HDFS

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file hdfs:///user/example/bigdata.file.copy

Amazon Simple Storage Service (S3)

WebHDFS

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file hdfs:///user/example/bigdata.file.copy

Amazon Simple Storage Service (S3)

Secure WebHDFS

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy

Amazon Simple Storage Service (S3)

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file swift://aserver.a424392/bigdata.file.copy

Amazon Simple Storage Service (S3)

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file s3a://aserver/bigdata.file.copy

Amazon Simple Storage Service (S3)

Oracle Cloud Infrastructure Object Storage

[oracle@cfclbv2491 ~]$ odcp s3a://aserver/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

HDFS

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace/bigdata.file hdfs:///user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

WebHDFS

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

Secure WebHDFS

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace/bigdata.file swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace//bigdata.file swift://aserver.a424392/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

Amazon Simple Storage Service (S3)

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace/bigdata.file s3a://aserver/bigdata.file.copy

Oracle Cloud Infrastructure Object Storage

Oracle Cloud Infrastructure Object Storage

[oracle@cfclbv2491 ~]$ odcp examplebmc://bucket@namespace/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

HTTP file

HDFS

[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/bigdata.file hdfs:///user/example/bigdata.file.copy

HTTP file

WebHDFS

[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/bigdata.file webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy

HTTP file

Secure WebHDFS

[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/my.file swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy

HTTP file

Oracle Cloud Infrastructure Object Storage Classic

[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/bigdata.file swift://aserver.a424392/bigdata.file.copy

HTTP file

Oracle Cloud Infrastructure Object Storage

[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/bigdata.file examplebmc://bucket@namespace/bigdata.file.copy

HTTP file

Amazon Simple Storage Service (S3

)
[oracle@cfclbv2491 ~]$ odcp http://exampleserver.com/my.file s3a://aserver/bigdata.file.copy

Use bda-oss-admin with odcp

Use bda-oss-admin commands to configure the cluster for use with storage providers. This makes it easier and faster to use odcp with the storage provider.

Any user with access privileges to the cluster can run odcp.

Note:

To copy data between HDFS and Oracle Cloud Infrastructure Object Storage Classic, you must have an Oracle Cloud Infrastructure Object Storage Classic account, which isn’t included with a Oracle Big Data Cloud Service account. To obtain an Oracle Cloud Infrastructure Object Storage Classic account, go to https://cloud.oracle.com/storage, or contact an Oracle Sales Representative.

Procedure

To copy data between HDFS and a storage provider:

  1. Open a command shell and connect to the cluster. You can connect to any node for which you have HDFS access rights. (Oracle Cloud Infrastructure Object Storage Classic is accessible from all nodes.) See Connect to a Cluster Node Through Secure Shell (SSH).

  2. Set shell environment variable values for Cloudera Manager access. See Setting bda-oss-admin Environment Variables

    Set these environment variables

    • CM_ADMIN — Cloudera Manager administrator user name

    • CM_PASSWORD — Cloudera Manager administrator password

    • CM_URL — Cloudera Manager URL

  3. You must also have access privileges to the storage provider you want to use. If you’re using the Oracle Cloud Infrastructure Object Storage Classic instance that was registered when the Hadoop cluster was created, you don’t have to set any environment variables to access it. By default, that storage service instance has the provider name BDCS, and you can use that name to access it; for example, swift://MyContainer.BDCS/data.raw. See Create a Cluster and Register Storage Credentials with the Cluster.

    You can also use providers (credentials) other than BDCS, to access different storage service instances or to access them as different users:

    • Set the PROVIDER_NAME environment variable to refer to the provider you want to use. For example, if you have a provider named rssfeeds-admin2, use SSH to connect to the cluster and enter:

      # PROVIDER_NAME="rssfeeds-admin2"

      Or, in a shell script:

      export PROVIDER_NAME="rssfeeds-admin2"

      Then, in your odcp commands, use that provider name; for example; swift://aContainer.rssfeeds-admin2/data.raw.

    See Register Storage Credentials with the Cluster.

  4. You can use the hadoop fs -ls command to browse your HDFS and storage data.

    • To browse HDFS, use:

      hadoop fs -ls

    • To browse storage, use:

      hadoop fs -ls swift://container.provider/

      You can also browse storage in Big Data Manager.

  5. Use the odcp command to copy files between Oracle Cloud Infrastructure Object Storage Classic and HDFS, as shown in the following examples.

Examples

  • Copy a file from HDFS to an Oracle Cloud Infrastructure Object Storage Classic container:

    # usr/bin/odcp hdfs:///user/example/data.raw swift://myContainer.myProvider/data.raw
    
  • Copy a file from an Oracle Cloud Infrastructure Object Storage Classic container to HDFS:

    # usr/bin/odcp swift://myContainer.myProvider/data.raw hdfs:///user/example/data.raw
    
  • Copy a directory from HDFS to an Oracle Cloud Infrastructure Object Storage Classic container:

    # usr/bin/odcp hdfs:///user/data/ swift://myContainer.myProvider/backup
    
  • If you have more than three nodes, you can increase transfer speed by specifying a higher number of executors. For example, if you have six nodes, use a command such as:

    # usr/bin/odcp –-num-executors=6 hdfs:///user/company/data.raw swift://myContainer.myProvider/data.raw

Filter and Copy Files

Use the odcp command with the --srcPattern option to filter and copy files, as shown in the following example.

sdfkjnlasn
# list source directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls swift://rstrejc.a424392/logs/
Found 3 items
-rw-rw-rw-   1    3499940 2016-10-18 09:58 swift://rstrejc.a424392/logs/spark.log
-rw-rw-rw-   1    7525772 2016-10-18 10:00 swift://rstrejc.a424392/logs/hadoop.log
-rw-rw-rw-   1          8 2016-10-18 10:13 swift://rstrejc.a424392/logs/report.txt
 
# filter and copy files
[oracle@cfclbv2491 ~]$ odcp -V --srcPattern ".*log" swift://rstrejc.a424392/logs/ hdfs:///user/oracle/filtered/
 
# list destination directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls hdfs:///user/oracle/filtered
Found 2 items
-rw-r--r--   3 oracle hadoop    3499940 2016-10-18 10:29 hdfs:///user/oracle/filtered/spark.log
-rw-r--r--   3 oracle hadoop    7525772 2016-10-18 10:30 hdfs:///user/oracle/filtered/hadoop.log

Filter, Copy, and Group Files

Use the odcp command with the -—groupBy and -—groupName options to filter, copy, and group files, as shown in the following example:

# list source directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls swift://rstrejc.a424392/logs/
Found 3 items
-rw-rw-rw-   1    3499940 2016-10-18 09:58 swift://rstrejc.a424392/logs/spark.log
-rw-rw-rw-   1    7525772 2016-10-18 10:00 swift://rstrejc.a424392/logs/hadoop.log
-rw-rw-rw-   1          8 2016-10-18 10:13 swift://rstrejc.a424392/logs/report.txt
 
# copy and group files
[oracle@cfclbv2491 ~]$ odcp --groupBy ".*log" --groupName "summary.log" swift://rstrejc.a424392/logs/ hdfs:///user/oracle/logs/
 
# list destination directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls hdfs:///user/oracle/logs
Found 1 items
-rw-r--r--   3 oracle hadoop   11025712 2016-10-18 10:00 hdfs:///user/oracle/logs/summary.log

Copy Files from an HTTP Server

You can use ODCP to copy files from an HTTP Server in a number of ways, as described below.

Copy Files From an HTTP Server

There are two ways to download files via the HTTP protocol:

  1. Specify each of files as a source file:

    [oracle@cfclbv2491 ~]$ odcp http://example.com/fileA http://example.com/fileB swift://rstrejc.a424392/dstDirectory
  2. Create a list of files to download:

    [oracle@cfclbv2491 ~]$ odcp --file-list hdfs:///files_to_download --file-list http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory

Use a File List to Specify Files

A file list is a comma-separated value (CSV) list file with following schema:

link_to_file[,http headers encoded in Base64]

For example:

http://172.16.253.111/public/big.file
https://172.16.253.111/public/small.file
http://172.16.253.111/private/secret.file,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
https://oracle:H4ppyF0x@172.16.253.111/private/small.file
where QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK is a Base64 encoded string (HTTP headers):
  • Authorization: Basic b3JhY2xlOkg0cHB5RjB4

For example:

[oracle@cfclbv2491 ~]$ odcp --file-list hdfs:///files_to_download --file-list http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory

Copy Files from an HTTP Server By Using the Proxy

When using the HTTP proxy, download files as shown in the following example:

[oracle@cfclbv2491 ~]$ odcp --http-proxy-host www-proxy.example.com http://example.com/fileA swift://rstrejc.a424392/dstDirectory

Copy a List of Files from an HTTP Server By Using a File with Predefined HTTP Headers

If you need to specify HTTP headers but you don’t want to specify them in a file list, you can create a separate file with HTTP headers and pass the file to ODCP, as shown in the example below:

[oracle@cfclbv2491 ~]$ odcp --http-headers hdfs:///file_with_http_headers http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory

The structure of the file with HTTP headers is:

regex_pattern,http_headers

For example, the following file will apply specific HTTP headers for files which contain "image" or "log" in their path or name:

.*image.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
.*log.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK

Use odcp to Copy Data on a Secure Cluster

Using odcp to copy data on a Kerberos-enabled cluster requires some additional steps.

Note:

In Oracle Big Data Cloud Service, Oracle Big Data Cloud Service, a cluster is Kerberos-enabled when it’s created with “Secure Setup.”

If you want to execute a long running job or run odcp from an automated shell script or from a workflow service such as Apache Oozie, then you must pass to the odcp command a Kerberos principal and the full path to the principal’s keytab file, as described below:

  1. Use SSH to connect to any node on the cluster.
  2. Choose the principal to be used for running the odcp command. In the example below it’s odcp@BDACLOUDSERVICE.EXAMPLE.COM.
  3. Generate a keytab file for the principal, as shown below: 
    $ kutil
    ktutil:  addent -password -p odcp@BDACLOUDSERVICE.EXAMPLE.COM -k 1 -e rc4-hmac
    Password for odcp@BDACLOUDSERVICE.EXAMPLE.COM: [enter your password]
    ktutil:  addent -password -p odcp@BDACLOUDSERVICE.EXAMPLE.COM -k 1 -e aes256-cts
    Password for odcp@BDACLOUDSERVICE.EXAMPLE.COM: [enter your password]
    ktutil:  wkt /home/odcp/odcp.keytab
    ktutil:  quit
  4. Pass the principal and the full path to the keytab file to the odcp command, for example:
    odcp --krb-principal odcp@BDACLOUDSERVICE.EXAMPLE.COM --krb-keytab /home/odcp/odcp.keytab source destination
If you just want to execute a short-running ODCP job from the console, you don't have to generate a keytab file or specify the principal. You just have to have an active Kerberos token (created using the kinit command).

Synchronize the Destination with Source

You can synchronize the destination with the source at the level of the file parts. When syncing HDFS with Oracle Storage Service, use the HDFS partSize equal to the file partSize on Oracle Storage Service.

What You Can Do When Synchronizing

The following list shows what you can do when synchronizing HDFS and Oracle Cloud Infrastructure Object Storage Classic sources and destinations:

  • HDFS to Oracle Cloud Infrastructure Object Storage Classic

    • Retrieve a list of already uploaded segments on Oracle Cloud Infrastructure Object Storage Classic.

    • Read file parts from HDFS.

    • Compare each file part checksum on HDFS with a checksum on Oracle Cloud Infrastructure Object Storage Classic that is already uploaded. If they’re the same, you can skip the transfer. Otherwise you can upload a part from HDFS to Oracle Cloud Infrastructure Object Storage Classic.

  • Oracle Cloud Infrastructure Object Storage Classic to HDFS

    • Retrieve list of already downloaded parts on HDFS. 

    • Split already concatenated files into parts on HDFS and calculate checksums.

    • Before downloading a segment from Oracle Cloud Infrastructure Object Storage Classic, compare its checksum with an already downloaded part checksum on HDFS. If they’re the same, skip the transfer. Otherwise, you can download the segment from Oracle Cloud Infrastructure Object Storage Classic and store it as a file part on HDFS.

  • Oracle Cloud Infrastructure Object Storage Classic to Oracle Cloud Infrastructure Object Storage Classic

    • Retrieve a list of already uploaded segments on Oracle Cloud Infrastructure Object Storage Classic.

    • Retrieve a list of source segments on Oracle Cloud Infrastructure Object Storage Classic.

    • Before downloading a segment from Oracle Cloud Infrastructure Object Storage Classic, compare its checksum with already uploaded segment checksum on Oracle Cloud Infrastructure Object Storage Classic. If they’re the same, you can skip the transfer. Otherwise, you can download the segment from one Oracle Cloud Infrastructure Object Storage Classic and upload it to another Oracle Cloud Infrastructure Object Storage Classic.

  • HDFS to HDFS

    • Retrieve a list of already downloaded parts on HDFS. 

    • Split already concatenated files into parts on HDFS and calculate checksums.

    • Read file parts from HDFS.

    • Compare each file part checksum on HDFS with already stored part checksums on HDFS. If they’re the same, you can skip the transfer. Otherwise you can store the part from HDFS to HDFS.

Examples

# sync file hdfs:///user/oracle/bigdata.file with swift://rstrejc.a42439/bigdata.file
odcp --sync hdfs:///user/oracle/bigdata.file swift://rstrejc.a42439
  
# sync file hdfs:///user/oracle/bigdata.file with swift://rstrejc.a42439/bigdata.file.new
odcp --sync hdfs:///user/oracle/bigdata.file swift://rstrejc.a42439/bigdata.file.new
 
# sync directory hdfs:///user/oracle/directoryWithBigData with swift://rstrejc.a42439/directoryWithBigData
odcp --sync hdfs:///user/oracle/directoryWithBigData swift://rstrejc.a42439/
 
# sync directory hdfs:///user/oracle/directoryWithBigData with swift://rstrejc.a42439/someDirectory/directoryWithBigData
odcp --sync hdfs:///user/oracle/directoryWithBigData swift://rstrejc.a42439/someDirectory

Retry a Failed Copy Job

If a copy job fails, you can retry it. When retrying the job, the source and destination are automatically synchronized. Therefore odcp doesn’t transfer successfully transferred file parts from source to destination.

The retry mechanism works as follows:

  1. Before transferring files, odcp retrieves the destination file status and stores it to HDFS.

  2. When a retry operation is required,

    1. odcp reads the destination file status stored on HDFS. 

    2. Input and output files are re-indexed with same result as in the failed execution. 

    3. The re-indexed files are synchronized.

  3. If the copying operation is successful, odcp deletes the stored file status from HDFS.

Example:

# Run odcp and let's assume it is going to fail
odcp hdfs:///user/oracle/bigdata.file swift://rstrejc.a424392/
# Run same command with --retry option
odcp --retry hdfs:///user/oracle/bigdata.file swift://rstrejc.a424392/