odcp Reference

The odcp command-line utility has the single command odcp, with parameters and options as described below.

Syntax

odcp [options] source1 [source2 ...] destination

Parameters

Parameter Description

Parameter	Description
`source1` `[source2` ...]	The source can be any of the following: One or more individual files. Wildcard characters are allowed (glob patterns). One or more HDFS directories. One or more storage containers. If you specify multiple sources, list them one after the other: `odcp source1 source2 source3 destination` If two or more source files have the same name, nothing is copied and `odcp` throws an exception. Regular expressions are supported through these parameters: `--srcPattern pattern` Files with matching names are copied. This parameter is ignored if the `––groupBy` parameter is set. `--groupBy pattern` Files with matching names are copied and are then concatenated into one output file. Set a name for the concatenated file name by using the parameter `--groupName output_file_name`. When the `--groupBy` parameter is used, the `--srcPattern` parameter is ignored.
`destination`	The destination can be any of the following: A specified file in an HDFS directory or a storage container If you don’t specify a file name, the name of the source file is used for the copied file at the destination. But you can specify a different filename at the destination, to prevent overwriting a file with the same name. An HDFS directory A storage container

source1 [source2 ...]

The source can be any of the following:

One or more individual files. Wildcard characters are allowed (glob patterns).
One or more HDFS directories.
One or more storage containers.

If you specify multiple sources, list them one after the other:

odcp source1 source2 source3 destination

If two or more source files have the same name, nothing is copied and odcp throws an exception.

Regular expressions are supported through these parameters:

--srcPattern pattern

Files with matching names are copied. This parameter is ignored if the ––groupBy parameter is set.
--groupBy pattern

Files with matching names are copied and are then concatenated into one output file. Set a name for the concatenated file name by using the parameter --groupName output_file_name.

When the --groupBy parameter is used, the --srcPattern parameter is ignored.

destination

The destination can be any of the following:

A specified file in an HDFS directory or a storage container

If you don’t specify a file name, the name of the source file is used for the copied file at the destination. But you can specify a different filename at the destination, to prevent overwriting a file with the same name.
An HDFS directory
A storage container

Use the following formats:

For HDFS:

hdfs:///path/[file]

For example: hdfs:///user/company/data.raw

or

hdfs://[host:port]/path/[file]

For example: hdfs://192.0.2.0:22/user/company/data.raw
For Oracle Storage Cloud Service::

swift://container.provider/[file]

where
- container is the name of a container in the Oracle Storage Cloud Service instance.
- provider is the provider name that serves as an alias for the credentials for accessing the instance. See Register Storage Credentials with the Cluster.
For example: swift://feeds.BDCS/stream–061016–1827–534

For examples showing other storage types, see odcp Supported Storage Sources and Targets

Options

Option	Description
`-b` `--block-size`	Destination file part size in bytes. Default = `134217728` Minimum = `1048576` Maximum = `2147483647` The remainder after dividing `partSize` by `blockSize` must be equal to zero.
`-c` `--concat`	Concatenate the file chunks (default).
`--executor-cores`	Specify the number of executor cores. The default value is `5`.
`--executor-memory`	Specify the executors memory limit in gigabytes. The default value is `40 GB`.
`--extra-conf`	Specify extra configuration options. For example: `--extra-conf spark.kryoserializer.buffer.max=128m`
`--groupBy`	Specify files to concatenate to a `destination` file by matching source file names with a regular expression.
`-h` `--help`	Show help for this command.
`--krb-keytab`	The full path to the keytab file of the Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)
`--krb-principal`	The Kerberos principal. (Use in a Kerberos-enabled Spark environment only.)
`-n` `--no-clobber`	Don’t overwrite an existing file.
`--non-recursive`	Don’t copy files recursively.
`--num-executors`	Specify the number of executors. The default value is `3` executors.
`--progress`	Show the progress of the data transfer.
`--retry`	Retry if the previous transfer failed or was interrupted.
`--partSize`	Destination file part size in bytes. Default = `536870912` Minimum = `1048576` Maximum = `2147483647` The remainder after dividing `partSize` by `blockSize` must be equal to zero.
`--spark-home`	The path to a directory containing an Apache Spark installation. If nothing is specified, `odcp` tries to find it in `/opt/cloudera directory`.
`--srcPattern`	Filters sources by matching the source name with a regular expression. `--srcPattern` is ignored when the `--groupBy` parameter is used.
`--sync`	Synchronize the `destrination` with the `source`.
`-V`	Enable verbose mode for debugging.