odcp Reference

The odcp command-line utility has the single command odcp, with parameters and options as described below.


odcp [options] source1 [source2 ...] destination


Parameter Description
source1 [source2 ...]

The source can be any of the following:

  • One or more individual files. Wildcard characters are allowed (glob patterns).

  • One or more HDFS directories.

  • One or more storage containers.

If you specify multiple sources, list them one after the other:

odcp source1 source2 source3 destination

If two or more source files have the same name, nothing is copied and odcp throws an exception.

Regular expressions are supported through these parameters:

  • --srcPattern pattern

    Files with matching names are copied. This parameter is ignored if the ––groupBy parameter is set.

  • --groupBy pattern

    Files with matching names are copied and are then concatenated into one output file. Set a name for the concatenated file name by using the parameter --groupName output_file_name.

    When the --groupBy parameter is used, the --srcPattern parameter is ignored.


The destination can be any of the following:

  • A specified file in an HDFS directory or a storage container

    If you don’t specify a file name, the name of the source file is used for the copied file at the destination. But you can specify a different filename at the destination, to prevent overwriting a file with the same name.

  • An HDFS directory

  • A storage container

Use the following formats:

  • For HDFS:


    For example: hdfs:///user/company/data.raw



    For example: hdfs://

  • For Oracle Storage Cloud Service::



    • container is the name of a container in the Oracle Storage Cloud Service instance.

    • provider is the provider name that serves as an alias for the credentials for accessing the instance. See Register Storage Credentials with the Cluster.

    For example: swift://feeds.BDCS/stream–061016–1827–534

For examples showing other storage types, see odcp Supported Storage Sources and Targets


Option Description



Destination file part size in bytes.

  • Default = 134217728

  • Minimum = 1048576

  • Maximum = 2147483647

The remainder after dividing partSize by blockSize must be equal to zero.



Concatenate the file chunks (default).


Specify the number of executor cores.

The default value is 5.


Specify the executors memory limit in gigabytes.

The default value is 40 GB.


Specify extra configuration options. For example:

--extra-conf spark.kryoserializer.buffer.max=128m


Specify files to concatenate to a destination file by matching source file names with a regular expression.



Show help for this command.


The full path to the keytab file of the Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)


The Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)



Don’t overwrite an existing file.


Don’t copy files recursively.


Specify the number of executors. The default value is 3 executors.


Show the progress of the data transfer.


Retry if the previous transfer failed or was interrupted.


Destination file part size in bytes.

  • Default = 536870912

  • Minimum = 1048576

  • Maximum = 2147483647

The remainder after dividing partSize by blockSize must be equal to zero.


The path to a directory containing an Apache Spark installation. If nothing is specified, odcp tries to find it in /opt/cloudera directory.


Filters sources by matching the source name with a regular expression.

--srcPattern is ignored when the --groupBy parameter is used.


Synchronize the destrination with the source.


Enable verbose mode for debugging.