odcp Reference

The odcp command-line utility has the single command odcp, with parameters and options as described below.

Syntax

odcp [options] source1 [source2 ...] destination

Parameters

Parameter Description
source1 [source2 ...]

The source can be any of the following:

  • One or more individual files. Wildcard characters are allowed (glob patterns).

  • One or more HDFS directories.

  • One or more storage containers.

If you specify multiple sources, list them one after the other:

odcp source1 source2 source3 destination

If two or more source files have the same name, nothing is copied and odcp throws an exception.

Regular expressions are supported through these parameters:

  • --srcPattern pattern

    Files with matching names are copied. This parameter is ignored if the ––groupBy parameter is set.

  • --groupBy pattern

    Files with matching names are copied and are then concatenated into one output file. Set a name for the concatenated file name by using the parameter --groupName output_file_name.

    When the --groupBy parameter is used, the --srcPattern parameter is ignored.

destination

The destination can be any of the following:

  • A specified file in an HDFS directory or a storage container

    If you don’t specify a file name, the name of the source file is used for the copied file at the destination. But you can specify a different filename at the destination, to prevent overwriting a file with the same name.

  • An HDFS directory

  • A storage container

Use the following formats:

  • For HDFS:

    hdfs:///path/[file]

    For example: hdfs:///user/company/data.raw

    or

    hdfs://[host:port]/path/[file]

    For example: hdfs://192.0.2.0:22/user/company/data.raw

  • For Oracle Storage Cloud Service::

    swift://container.provider/[file]

    where

    • container is the name of a container in the Oracle Storage Cloud Service instance.

    • provider is the provider name that serves as an alias for the credentials for accessing the instance. See Register Storage Credentials with the Cluster.

    For example: swift://feeds.BDCS/stream–061016–1827–534

For examples showing other storage types, see odcp Supported Storage Sources and Targets

Options

Option Description

-b

--block-size

Destination file part size in bytes.

  • Default = 134217728

  • Minimum = 1048576

  • Maximum = 2147483647

The remainder after dividing partSize by blockSize must be equal to zero.

-c

--concat

Concatenate the file chunks (default).

--executor-cores

Specify the number of executor cores.

The default value is 5.

--executor-memory

Specify the executors memory limit in gigabytes.

The default value is 40 GB.

--extra-conf

Specify extra configuration options. For example:

--extra-conf spark.kryoserializer.buffer.max=128m

--groupBy

Specify files to concatenate to a destination file by matching source file names with a regular expression.

-h

--help

Show help for this command.

--krb-keytab

The full path to the keytab file of the Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)

--krb-principal

The Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)

-n

--no-clobber

Don’t overwrite an existing file.

--non-recursive

Don’t copy files recursively.

--num-executors

Specify the number of executors. The default value is 3 executors.

--progress

Show the progress of the data transfer.

--retry

Retry if the previous transfer failed or was interrupted.

--partSize

Destination file part size in bytes.

  • Default = 536870912

  • Minimum = 1048576

  • Maximum = 2147483647

The remainder after dividing partSize by blockSize must be equal to zero.

--spark-home 

The path to a directory containing an Apache Spark installation. If nothing is specified, odcp tries to find it in /opt/cloudera directory.

--srcPattern

Filters sources by matching the source name with a regular expression.

--srcPattern is ignored when the --groupBy parameter is used.

--sync

Synchronize the destrination with the source.

-V

Enable verbose mode for debugging.