Copying Data With odcp
Use the Oracle Big Data Cloud Service distributed-copy utility odcp at the command line to copy data between HDFS on your cluster and various other supported storage providers.
See odcp Reference for the odcp syntax, parameters, and options.
Prerequisites
-
Access to all running storage services
-
All required credentials established, for example Oracle Storage Cloud Service accounts
Operations Allowed When Using odcp to Copy Data, by Storage Type
You can use odcp to copy data between a number of different kinds of storage. The operations available for each possible combination are shown below.
For examples of each scenario, see odcp Supported Storage Sources and Targets.
The following table summarizes what you can do with odcp when copying data between various types of storage. The first two columns show the scenario (from/to) and the remaining columns show the operations.
When Transferring Data From... | To... | Copy | Filter and Copy | Filter, Copy, and Group | Move | Sync | Retry |
---|---|---|---|---|---|---|---|
HDFS | HDFS | yes | yes | yes | no | yes | yes |
HDFS | WebHDFS | yes | yes | yes | no | no | no |
HDFS | Secure WebHDFS | yes | yes | yes | no | no | no |
HDFS | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | yes | yes |
HDFS | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage Classic | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | yes | yes |
Oracle Cloud Infrastructure Object Storage Classic | HDFS | yes | yes | yes | no | yes | yes |
Oracle Cloud Infrastructure Object Storage Classic | WebHDFS | yes | yes | yes | no | no | no |
Oracle Cloud Infrastructure Object Storage Classic | Secure WebHDFS | yes | yes | yes | no | no | no |
Oracle Cloud Infrastructure Object Storage Classic | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage Classic | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
WebHDFS | WebHDFS | yes | yes | yes | no | no | no |
WebHDFS | HDFS | yes | yes | yes | no | no | no |
WebHDFS | HDFS | yes | yes | yes | no | no | no |
WebHDFS | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | no | no |
WebHDFS | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
WebHDFS | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
Secure WebHDFS | WebHDFS | yes | yes | yes | no | no | no |
Secure WebHDFS | HDFS | yes | yes | yes | no | no | no |
Secure WebHDFS | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | no | no |
Secure WebHDFS | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
Secure WebHDFS | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | HDFS | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | WebHDFS | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | Secure WebHDFS | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
Amazon Simple Storage Service (S3) | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | HDFS | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | WebHDFS | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | Secure WebHDFS | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
Oracle Cloud Infrastructure Object Storage | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
HTTP | HDFS | yes | yes | yes | no | no | no |
HTTP | WebHDFS | yes | yes | yes | no | no | no |
HTTP | Secure WebHDFS | yes | yes | yes | no | no | no |
HTTP | Oracle Cloud Infrastructure Object Storage Classic | yes | yes | no | no | no | no |
HTTP | Oracle Cloud Infrastructure Object Storage | yes | yes | no | no | no | no |
HTTP | Amazon Simple Storage Service (S3) | yes | yes | no | no | no | no |
odcp Supported Storage Sources and Targets
The following examples show different scenarios for copying data between
the various storage systems and services supported by odcp
.
Copy From... | To... | Example |
---|---|---|
HDFS |
HDFS |
[oracle@cfclbv2491
~]$ odcp hdfs:///user/example/bigdata.file
hdfs:///user/example/bigdata.file.copy |
HDFS |
WebHDFS |
[oracle@cfclbv2491
~]$ odcp hdfs:///user/example/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
HDFS |
Secure WebHDFS |
[oracle@cfclbv2491
~]$ odcp hdfs:///user/example/bigdata.file
swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy |
HDFS |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491
~]$ odcp hdfs:///user/example/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
HDFS |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491
~]$ odcp hdfs:///user/example/bigdata.file
s3a://aserver/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491
~]$ odcp swift://aserver.a424392/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
HDFS |
[oracle@cfclbv2491
~]$ odcp swift://aserver.a424392/bigdata.file
hdfs:///user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
WebHDFS |
[oracle@cfclbv2491
~]$ odcp swift://aserver.a424392/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
Secure WebHDFS |
[oracle@cfclbv2491
~]$ odcp swift://aserver.a424392/bigdata.file
swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491
~]$ odcp swift://aserver.a424392/bigdata.file
s3a://aserver/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491 ~]$ odcp
swift://aserver.a424392/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
|
WebHDFS |
WebHDFS |
[oracle@cfclbv2491
~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
WebHDFS |
HDFS |
[oracle@cfclbv2491
~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
hdfs:///user/example/bigdata.file.copy |
WebHDFS |
HDFS |
[oracle@cfclbv2491
~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
hdfs:///user/example/bigdata.file.copy |
WebHDFS |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491
~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
WebHDFS |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491
~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
s3a://aserver/bigdata.file.copy |
WebHDFS |
Oracle Cloud Infrastructure Object Storage |
[oracle@cfclbv2491 ~]$ odcp
webhdfs://webhdfs-host:50070/user/example/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
Secure WebHDFS |
WebHDFS |
[oracle@cfclbv2491
~]$ odcp
swebhdfs://webhdfs-host:50470/user/example/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
Secure WebHDFS |
HDFS |
[oracle@cfclbv2491
~]$ odcp
swebhdfs://webhdfs-host:50470/user/example/bigdata.file
hdfs:///user/example/bigdata.file.copy |
Secure WebHDFS |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491
~]$ odcp
swebhdfs://webhdfs-host:50470/user/example/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
Secure WebHDFS |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491
~]$ odcp
swebhdfs://webhdfs-host:50470/user/example/bigdata.file
s3a://aserver/bigdata.file.copy |
Secure WebHDFS |
Oracle Cloud Infrastructure Object Storage |
[oracle@cfclbv2491 ~]$ odcp
swebhdfs://webhdfs-host:50070/user/example/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
HDFS |
[oracle@cfclbv2491
~]$ odcp s3a://aserver/bigdata.file
hdfs:///user/example/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
WebHDFS |
[oracle@cfclbv2491
~]$ odcp s3a://aserver/bigdata.file
hdfs:///user/example/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
Secure WebHDFS |
[oracle@cfclbv2491
~]$ odcp s3a://aserver/bigdata.file
swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491
~]$ odcp s3a://aserver/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491
~]$ odcp s3a://aserver/bigdata.file
s3a://aserver/bigdata.file.copy |
Amazon Simple Storage Service (S3) |
Oracle Cloud Infrastructure Object Storage |
[oracle@cfclbv2491 ~]$ odcp
s3a://aserver/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
HDFS |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace/bigdata.file
hdfs:///user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
WebHDFS |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
Secure WebHDFS |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace/bigdata.file
swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace//bigdata.file
swift://aserver.a424392/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
Amazon Simple Storage Service (S3) |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace/bigdata.file
s3a://aserver/bigdata.file.copy |
Oracle Cloud Infrastructure Object Storage |
Oracle Cloud Infrastructure Object Storage |
[oracle@cfclbv2491 ~]$ odcp
examplebmc://bucket@namespace/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
HTTP file |
HDFS |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/bigdata.file
hdfs:///user/example/bigdata.file.copy |
HTTP file |
WebHDFS |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/bigdata.file
webhdfs://webhdfs-host:50070/user/example/bigdata.file.copy |
HTTP file |
Secure WebHDFS |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/my.file
swebhdfs://webhdfs-host:50470/user/example/bigdata.file.copy |
HTTP file |
Oracle Cloud Infrastructure Object Storage Classic |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/bigdata.file
swift://aserver.a424392/bigdata.file.copy |
HTTP file |
Oracle Cloud Infrastructure Object Storage |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/bigdata.file
examplebmc://bucket@namespace/bigdata.file.copy |
HTTP file |
Amazon Simple Storage Service (S3 ) |
[oracle@cfclbv2491 ~]$ odcp
http://exampleserver.com/my.file
s3a://aserver/bigdata.file.copy |
Use bda-oss-admin with odcp
Use bda-oss-admin
commands to configure the cluster for use with storage providers. This makes it easier and faster to use odcp with the storage provider.
Any user with access privileges to the cluster can run odcp.
Note:
To copy data between HDFS and Oracle Cloud Infrastructure Object Storage Classic, you must have an Oracle Cloud Infrastructure Object Storage Classic account, which isn’t included with a Oracle Big Data Cloud Service account. To obtain an Oracle Cloud Infrastructure Object Storage Classic account, go to https://cloud.oracle.com/storage, or contact an Oracle Sales Representative.Procedure
To copy data between HDFS and a storage provider:
-
Open a command shell and connect to the cluster. You can connect to any node for which you have HDFS access rights. (Oracle Cloud Infrastructure Object Storage Classic is accessible from all nodes.) See Connect to a Cluster Node Through Secure Shell (SSH).
-
Set shell environment variable values for Cloudera Manager access. See Setting bda-oss-admin Environment Variables
Set these environment variables
-
CM_ADMIN
— Cloudera Manager administrator user name -
CM_PASSWORD
— Cloudera Manager administrator password -
CM_URL
— Cloudera Manager URL
-
-
You must also have access privileges to the storage provider you want to use. If you’re using the Oracle Cloud Infrastructure Object Storage Classic instance that was registered when the Hadoop cluster was created, you don’t have to set any environment variables to access it. By default, that storage service instance has the provider name
BDCS
, and you can use that name to access it; for example,swift://MyContainer.BDCS/data.raw.
See Create a Cluster and Register Storage Credentials with the Cluster.You can also use providers (credentials) other than
BDCS
, to access different storage service instances or to access them as different users:-
Any providers besides
BDCS
must be added to the Hadoop configuration.-
To see what credentials are already added, use the
bda-oss-admin list_swift_creds
command. You can also look at the/etc/hadoop/conf/core-site.xml
file. See Reviewing the Configuration. -
To add a single credential, use the
bda-oss-admin add_swift_cred
command. -
To add multiple credentials, use the
bda-oss-admin import_swift_creds
command.
-
-
Set the
PROVIDER_NAME
environment variable to refer to the provider you want to use. For example, if you have a provider namedrssfeeds-admin2
, use SSH to connect to the cluster and enter:# PROVIDER_NAME="rssfeeds-admin2"
Or, in a shell script:
export PROVIDER_NAME="rssfeeds-admin2"
Then, in your
odcp
commands, use that provider name; for example;swift://aContainer.rssfeeds-admin2/data.raw
.
-
-
You can use the
hadoop fs -ls
command to browse your HDFS and storage data.- To browse HDFS, use:
hadoop fs -ls
- To browse storage, use:
hadoop fs -ls swift://container.provider/
You can also browse storage in Big Data Manager.
- To browse HDFS, use:
-
Use the
odcp
command to copy files between Oracle Cloud Infrastructure Object Storage Classic and HDFS, as shown in the following examples.
Examples
-
Copy a file from HDFS to an Oracle Cloud Infrastructure Object Storage Classic container:
# usr/bin/odcp hdfs:///user/example/data.raw swift://myContainer.myProvider/data.raw
-
Copy a file from an Oracle Cloud Infrastructure Object Storage Classic container to HDFS:
# usr/bin/odcp swift://myContainer.myProvider/data.raw hdfs:///user/example/data.raw
-
Copy a directory from HDFS to an Oracle Cloud Infrastructure Object Storage Classic container:
# usr/bin/odcp hdfs:///user/data/ swift://myContainer.myProvider/backup
-
If you have more than three nodes, you can increase transfer speed by specifying a higher number of executors. For example, if you have six nodes, use a command such as:
# usr/bin/odcp –-num-executors=6 hdfs:///user/company/data.raw swift://myContainer.myProvider/data.raw
Filter and Copy Files
Use the odcp
command with the --srcPattern
option to filter and copy files, as shown in the following example.
# list source directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls swift://rstrejc.a424392/logs/
Found 3 items
-rw-rw-rw- 1 3499940 2016-10-18 09:58 swift://rstrejc.a424392/logs/spark.log
-rw-rw-rw- 1 7525772 2016-10-18 10:00 swift://rstrejc.a424392/logs/hadoop.log
-rw-rw-rw- 1 8 2016-10-18 10:13 swift://rstrejc.a424392/logs/report.txt
# filter and copy files
[oracle@cfclbv2491 ~]$ odcp -V --srcPattern ".*log" swift://rstrejc.a424392/logs/ hdfs:///user/oracle/filtered/
# list destination directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls hdfs:///user/oracle/filtered
Found 2 items
-rw-r--r-- 3 oracle hadoop 3499940 2016-10-18 10:29 hdfs:///user/oracle/filtered/spark.log
-rw-r--r-- 3 oracle hadoop 7525772 2016-10-18 10:30 hdfs:///user/oracle/filtered/hadoop.log
Filter, Copy, and Group Files
Use the odcp
command with the -—groupBy
and -—groupName
options to filter, copy, and group files, as shown in the following example:
# list source directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls swift://rstrejc.a424392/logs/
Found 3 items
-rw-rw-rw- 1 3499940 2016-10-18 09:58 swift://rstrejc.a424392/logs/spark.log
-rw-rw-rw- 1 7525772 2016-10-18 10:00 swift://rstrejc.a424392/logs/hadoop.log
-rw-rw-rw- 1 8 2016-10-18 10:13 swift://rstrejc.a424392/logs/report.txt
# copy and group files
[oracle@cfclbv2491 ~]$ odcp --groupBy ".*log" --groupName "summary.log" swift://rstrejc.a424392/logs/ hdfs:///user/oracle/logs/
# list destination directory
[oracle@cfclbv2491 ~]$ hadoop fs -ls hdfs:///user/oracle/logs
Found 1 items
-rw-r--r-- 3 oracle hadoop 11025712 2016-10-18 10:00 hdfs:///user/oracle/logs/summary.log
Copy Files from an HTTP Server
You can use ODCP to copy files from an HTTP Server in a number of ways, as described below.
Copy Files From an HTTP Server
There are two ways to download files via the HTTP protocol:
-
Specify each of files as a source file:
[oracle@cfclbv2491 ~]$ odcp http://example.com/fileA http://example.com/fileB swift://rstrejc.a424392/dstDirectory
-
Create a list of files to download:
[oracle@cfclbv2491 ~]$ odcp --file-list hdfs:///files_to_download --file-list http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory
Use a File List to Specify Files
A file list is a comma-separated value (CSV) list file with following schema:
link_to_file[,http headers encoded in Base64]
For example:
http://172.16.253.111/public/big.file
https://172.16.253.111/public/small.file
http://172.16.253.111/private/secret.file,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
https://oracle:H4ppyF0x@172.16.253.111/private/small.file
QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
is a Base64 encoded string (HTTP headers):
-
Authorization: Basic b3JhY2xlOkg0cHB5RjB4
For example:
[oracle@cfclbv2491 ~]$ odcp --file-list hdfs:///files_to_download --file-list http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory
Copy Files from an HTTP Server By Using the Proxy
When using the HTTP proxy, download files as shown in the following example:
[oracle@cfclbv2491 ~]$ odcp --http-proxy-host www-proxy.example.com http://example.com/fileA swift://rstrejc.a424392/dstDirectory
Copy a List of Files from an HTTP Server By Using a File with Predefined HTTP Headers
If you need to specify HTTP headers but you don’t want to specify them in a file list, you can create a separate file with HTTP headers and pass the file to ODCP, as shown in the example below:
[oracle@cfclbv2491 ~]$ odcp --http-headers hdfs:///file_with_http_headers http://example.com/logs_to_download swift://rstrejc.a424392/dstDirectory
The structure of the file with HTTP headers is:
regex_pattern,http_headers
For example, the following file will apply specific HTTP headers for files which contain "image" or "log" in their path or name:
.*image.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
.*log.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK
Use odcp to Copy Data on a Secure Cluster
Using odcp to copy data on a Kerberos-enabled cluster requires some additional steps.
Note:
In Oracle Big Data Cloud Service, Oracle Big Data Cloud Service, a cluster is Kerberos-enabled when it’s created with “Secure Setup.”
If you want to
execute a long running job or run odcp from
an automated shell script or from a workflow service such as Apache
Oozie, then you must pass to the odcp
command a
Kerberos principal and the full path to the principal’s
keytab
file, as described
below:
keytab
file or
specify the principal. You just have to have an active Kerberos token
(created using the kinit
command).
Synchronize the Destination with Source
You can synchronize the destination with the source at the level of the file parts. When syncing HDFS with Oracle Storage Service, use the HDFS partSize
equal to the file partSize
on Oracle Storage Service.
What You Can Do When Synchronizing
The following list shows what you can do when synchronizing HDFS and Oracle Cloud Infrastructure Object Storage Classic sources and destinations:
-
HDFS to Oracle Cloud Infrastructure Object Storage Classic
-
Retrieve a list of already uploaded segments on Oracle Cloud Infrastructure Object Storage Classic.
-
Read file parts from HDFS.
-
Compare each file part checksum on HDFS with a checksum on Oracle Cloud Infrastructure Object Storage Classic that is already uploaded. If they’re the same, you can skip the transfer. Otherwise you can upload a part from HDFS to Oracle Cloud Infrastructure Object Storage Classic.
-
-
Oracle Cloud Infrastructure Object Storage Classic to HDFS
-
Retrieve list of already downloaded parts on HDFS.
-
Split already concatenated files into parts on HDFS and calculate checksums.
-
Before downloading a segment from Oracle Cloud Infrastructure Object Storage Classic, compare its checksum with an already downloaded part checksum on HDFS. If they’re the same, skip the transfer. Otherwise, you can download the segment from Oracle Cloud Infrastructure Object Storage Classic and store it as a file part on HDFS.
-
-
Oracle Cloud Infrastructure Object Storage Classic to Oracle Cloud Infrastructure Object Storage Classic
-
Retrieve a list of already uploaded segments on Oracle Cloud Infrastructure Object Storage Classic.
-
Retrieve a list of source segments on Oracle Cloud Infrastructure Object Storage Classic.
-
Before downloading a segment from Oracle Cloud Infrastructure Object Storage Classic, compare its checksum with already uploaded segment checksum on Oracle Cloud Infrastructure Object Storage Classic. If they’re the same, you can skip the transfer. Otherwise, you can download the segment from one Oracle Cloud Infrastructure Object Storage Classic and upload it to another Oracle Cloud Infrastructure Object Storage Classic.
-
-
HDFS to HDFS
-
Retrieve a list of already downloaded parts on HDFS.
-
Split already concatenated files into parts on HDFS and calculate checksums.
-
Read file parts from HDFS.
-
Compare each file part checksum on HDFS with already stored part checksums on HDFS. If they’re the same, you can skip the transfer. Otherwise you can store the part from HDFS to HDFS.
-
Examples
# sync file hdfs:///user/oracle/bigdata.file with swift://rstrejc.a42439/bigdata.file
odcp --sync hdfs:///user/oracle/bigdata.file swift://rstrejc.a42439
# sync file hdfs:///user/oracle/bigdata.file with swift://rstrejc.a42439/bigdata.file.new
odcp --sync hdfs:///user/oracle/bigdata.file swift://rstrejc.a42439/bigdata.file.new
# sync directory hdfs:///user/oracle/directoryWithBigData with swift://rstrejc.a42439/directoryWithBigData
odcp --sync hdfs:///user/oracle/directoryWithBigData swift://rstrejc.a42439/
# sync directory hdfs:///user/oracle/directoryWithBigData with swift://rstrejc.a42439/someDirectory/directoryWithBigData
odcp --sync hdfs:///user/oracle/directoryWithBigData swift://rstrejc.a42439/someDirectory
Retry a Failed Copy Job
If a copy job fails, you can retry it. When retrying the job, the source and destination are automatically synchronized. Therefore odcp doesn’t transfer successfully transferred file parts from source to destination.
The retry mechanism works as follows:
-
Before transferring files, odcp retrieves the destination file status and stores it to HDFS.
-
When a retry operation is required,
-
odcp reads the destination file status stored on HDFS.
-
Input and output files are re-indexed with same result as in the failed execution.
-
The re-indexed files are synchronized.
-
-
If the copying operation is successful, odcp deletes the stored file status from HDFS.
Example:
# Run odcp and let's assume it is going to fail
odcp hdfs:///user/oracle/bigdata.file swift://rstrejc.a424392/
# Run same command with --retry option
odcp --retry hdfs:///user/oracle/bigdata.file swift://rstrejc.a424392/