Using the Oracle Big Data Manager bdm-cli Utility

9 Using the Oracle Big Data Manager bdm-cli Utility

Use the bdm-cli (Oracle Big Data Manager Command Line Interface) utility to copy data and manage copy jobs at the command line.

bdm-cli has several commands that duplicate odcp commands , but bdm-cli also includes additional commands for scheduling and managing copy jobs and other administrative tasks.

You have to download and install bdm-cli yourself, either on a node of the cluster or on a remote operating system. If you install it on your cluster, you must use SSH to connect to the cluster. If you install it on a remote system, you can run the commands without SSH. See Installing the bdm-cli Utility.

There are no special requirements for using bdm-cli when it’s installed outside the cluster.

9.1 Installing the bdm-cli Utility

The bdm-cli (Big Data Command Line Interface) is a command line utility for copying data and managing copy jobs. You can download and install bdm-cli from GitHub. You can install it on a remote operating system, so you don’t have to use SSH to connect to the cluster.

To install bdm-cli:

If you use a proxy server, first call:

export http_proxy="your_proxy_server" 
export https_proxy="your_proxy_server"

Then call:

curl -L https://github.com/jazeman/bdm-python-cli/blob/1.0/install-rpm?raw=true | bash

9.2 Usage

You can use bdm-cli at the command line to create and manage copy jobs.

Syntax

bdm-cli [global_options] subcommand [options][arguments]...

Supported Storage Protocols and Paths

The protocols and paths to the file systems and storage services supported by bdm-cli are:

HDFS:

hdfs:///
Oracle Cloud Infrastructure Object Storage Classic (formerly known as Oracle Storage Cloud Service):

swift://container.provider/
Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare Metal Cloud Object Storage Service):

oss:///container

For operations with Oracle Cloud Infrastructure Object Storage, you must specify the provider by using the options src-provider and dst-provider. For example, those options are used with bdm-cli create_job when used with Oracle Cloud Infrastructure Object Storage.

Finding a Job’s UUID

A number of bdm-cli subcommands require that you identify a job by its Universally Unique Identifier (UUID). To find UUIDs, execute bdm-cli list_all_jobs.

Specifying Source and Destination Paths

When specifying sources and destinations, fully qualify the paths:

source ...

File name qualified by protocol and full path, for example: hdfs:///user/oracle/test.raw
destination

Directory name qualified by protocol and full path, for example: swift://container.storagename/test-dir

Setting Environment Variables

You can set some bdm-cli options as environment variables. For example, you can set Oracle Big Data Manager URL and user password file, as follows:

export BDM_URL=https://hostname:8888/bdcs/api && export BDM_PASSWORD=/tmp/password_file

All the bdm-cli options that can be set as environment variables are documented in the sections below.

Getting Help

To get help for bdm-cli use:

bdm-cli --help

To get help for a specific command use:

bdm-cli command --help

For example:

bdm-cli edit_job_template --help

9.3 Options

Options that can be used by all bdm-cli commands are explained below.

Option	Description
`--bdm-passwd path_to_password_file`	Path to the Oracle Big Data Manager user password file. Environment variable: `BDM_PASSWORD`
`--bdm-url bdm_url`	Oracle Big Data Manager server URL. Environment variable: `BDM_URL`
`--bdm-username username`	Oracle Big Data Manager server user name. Default value: `oracle` Environment variable: `BDM_USERNAME`
`-f [table\|csv\|json]`	Specify the output format: `table` (default) Each field is displayed in a separate column. csv Each record is displayed as a comma-separated list on a single line. `json`: The output is displayed in JavaScript Object Notation (JSON) format.
`--fields fields`	Specifies comma-separated fields depending on the type of object.
`-h` `--help`	Show this message and exit.
`--no-check-certificate`	Don't validate the server's certificate.
`--proxy proxy`	Proxy server.
`--tenant-name tenant_name`	Name of the tenant. Default value: `admin`
`-v`	Print the REST request body.
`--version`	Show the Oracle Big Data Manager version and exit.

9.4 Subcommands

The following table summarizes the bdm-cli subcommands. For more details on each, click the name of the command.

Command	Description
bdm-cli abort_job	Abort a running job.
bdm-cli copy	Execute a job to copy sources to destination.
bdm-cli create_job	Execute a new job from an existing template.
bdm-cli create_job_template	Create a new job template.
bdm-cli get_data_source	Find a data source by name.
bdm-cli get_job	Get a job by UUID.
bdm-cli get_job_log	Get a job log.
bdm-cli list_all_jobs	List all jobs from the execution history.
bdm-cli list_template_executions	List all jobs from the execution history for the given template.
bdm-cli ls	List files from a specific location.

9.5 bdm-cli abort_job

Abort a running job.

Syntax

bdm-cli abort_job [options] job_uuid

Options

Option Description

Option	Description
`--force`	Force abort job.
`-h` `--help`	Show this message and exit.

--force

Force abort job.

-h

--help

Show this message and exit.

Example

Abort a job.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} abort_job 24ef30e8-913b-4402-baf8-74b99c211f50

9.6 bdm-cli copy

Execute a job to copy sources to destination.

Syntax

bdm-cli copy [options] source... destination

Options

Option	Description
`block-size` block_size	Specify the block size in bytes.
`--description description`	Data source description.
`--driver-memory-size driver_memory_size`	Specify the maximum amount of memory for the Oracle Storage Cloud Service driver.
`--dst-provider oss_destination_provider`	Specify the provider of the destination, when using Oracle Cloud Infrastructure Object Storage Classic destination.
`-h` `--help`	Show this message and exit.
`--memory-size-per-node memory_size_per_node`	Specify the Spark executors memory limit in GB per node, for example, `40GB`.
`--number-of-executor-nodesnumber_of_executors_per_node`	Specify the maximum number of Spark executors per node, for example, `10GB`.
`--number-of-threads—per-nodenumber_of_threads_per_node`	Specify the maximum number of threads per node.
`part-size part_size`	Specify the part size in bytes.
`--recursive` `--no-recursive`	Recursively copy (enabled by default).
`--retry` `--no-retry`	Retry data transfer in case of failure.
`--src-provider oss_source_provider`	Specify the provider of the source, when using for Oracle Cloud Infrastructure Object Storage Classic.
`--sync` `--no-sync`	Synchronize the source with the destination.

Example

Copy a file from HDFS to Oracle Storage Cloud Service:

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} copy hdfs:///user/${DATA_USER}/1MFile.raw oss:///${DATA_USER} --dst-provider ${OSS_PROVIDER}

9.7 bdm-cli create_job

Execute a new job from an existing template.

Syntax

bdm-cli create_job [options] job_template_name

Options

Option	Description
`--run-now`	Execute job immediately if job scheduling is set. Ignored otherwise.
`--source source`	Source file, for example: `hdfs:///user/oracle/test.raw`
`--destination destination`	The destination directory, for example: `swift://container.storagename/test-dir`.
`--driver-memory-size driver_memory_size`	Specify the maximum amount of memory for an Oracle Storage Cloud Service driver.
`--memory-size-per-node memory_size_per_node`	Specify the Spark executors memory limit in GB per node, for example: `40G`.
`--number-of-executor-nodes number_of_executors_per_node`	Specify the maximum number of Spark executors per node, for example: `10G`.
`--number-of-threads-per-node` number_of_threads_per_node	Specify the maximum number of threads per node.
`--block-size block_size`	Specify the block size in bytes.
`--part-size part_size`	Specify the part size in bytes.
`--retry` `--no-retry`	Retry data transfer in case of failure.
`--sync` `--no-sync`	Synchronize the source with the destination.
`--recursive` `--no-recursive`	Recursively copy (enabled by default).
`--job-executable-class job_executable_class`	Main Java class used for the Spark job execution.
`--src-provider oss_source_provider`	Specify the provider of the source, when using an Oracle Cloud Infrastructure Object Storage Classic source.
`--dst-provider oss_destination_provider`	Specify the provider of the destination, when using an Oracle Cloud Infrastructure Object Storage Classic destination.
`-h` `--help`	Show this message and exit.

9.8 bdm-cli create_job_template

Create a new job template.

Syntax

bdm-cli create_job_template [options] job_template_name source ... destination

Options

Option	Description
`--abort-running-job` `--no-abort-running-job`	Abort an already running execution if the next scheduled execution is started.
`--block-size block_size`	Specify block size in bytes.
`--data-source-name data_source_name`	Job's data source name.
`--description description`	Job template description.
`--dst-provider destination_provider`	Specify for `oss:///` destination.
`--environment environment`	Environment in JSON format: `{"envName1": "envValue2", "envName2": "envValue2"}`
`-h` `--help`	Show this message and exit.
`--history-size history_size`	Count of executions history log.
`--job-executable-class job_executable_class`	Main Java class used for the Spark job execution.
`--job-schedule job_schedule`	Specify cron-like job schedule, for example: `"0 56 8 * * ?"` means run every day at 08h 56m UTC time.
`--job-template-type job_template_type`	Specify job template type. Allowed values are: `DATA_MOVEMENT_COPY` `GENERAL`
`--libraries libraries`	Hadoop libraries, for example: `OdcpLibraries`. This option can have multiple values, for example: `--libraries OdcpLibraries --libraries OdcpLibraries`
`--memory-size-per-node memory_size_per_node`	Specify the Spark executors memory limit in GB per node, for example: `40G`.
`--number-of-executor-nodes number_of_executor_per_node`	Specify the maximum number of Spark executors per node, for example: `10G`.
`--number-of-threads-per-node number_of_threads_per_node`	Specify the maximum of threads per node.
`--part-size part_size`	Specify part size in bytes.
`--recursive` `--no-recursive`	Recursively copy (enabled by default).
`--retry` `--no-retry`	Retry data transfer in case of failure.
`--src-provider oss_source_provider`	Specify the provider of the source, when using for Oracle Bare Metal Cloud Object Storage Service.
`--sync` `--no-sync`	Synchronize source with destination.
`--tags tags`	User defined tag. This option can have multiple values, for example: `--tags system --tags datamovement --tags copy`

9.9 bdm-cli get_data_source

Find a data source by name.

Syntax

bdm-cli get_data_source [options] data_source_name

Options

Option Description

Option	Description
`-h` `--help`	Show this message and exit.

-h

--help

Show this message and exit.

9.10 bdm-cli get_job

Get a job by UUID.

Syntax

bdm-cli get_job [options] job_uuid

Options

Option Description

Option	Description
`-h` `--help`	Show this message and exit.

-h

--help

Show this message and exit.

Example

Get information on a job.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} get_job ${JOB_UUID}

9.11 bdm-cli get_job_log

Get a job log.

Syntax

bdm-cli get_job_log [options] job_uuid

Options

Option Description

Option	Description
`-h` `--help`	Show this message and exit.

-h

--help

Show this message and exit.

9.12 bdm-cli list_all_jobs

List all jobs from the execution history.

Syntax

bdm-cli list_all_jobs [options]

Options

Option Description

Option	Description
`-h` `--help`	Show this message and exit.
`--limit limit`	Specify the size of the page.
`--offset offset`	Specify the paging offset.

-h

--help

Show this message and exit.

--limit limit

Specify the size of the page.

--offset offset

Specify the paging offset.

Example

List all jobs.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   list_all_jobs

Use the --offset and --limit options to restrict the results. For example to get the eighth page when there are 20 rows per page, do the following:

bdm-cli list_all_jobs --offset 8 --limit 20

9.13 bdm-cli list_template_executions

List all jobs from the execution history for the given template.

Syntax

bdm-cli list_template_executions  [options] job_uuid

Options

Option Description

Option	Description
`-h` `--help`	Show this message and exit.

-h

--help

Show this message and exit.

9.14 bdm-cli ls

List files from a specific location.

Syntax

bdm-cli ls [options] path_1 ... path_n

Options

Option	Description
`-h` `-–human-readable`	Human readable file sizes.
`-d` `--dirs-only`	List directories only.
`--provider oss_provider`	Specify for Oracle Bare Metal Cloud Object Storage Service paths.
`-h` `--help`	Show this message and exit.

Examples

List HDFS content under selected user.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   ls hdfs:///user/${DATA_USER}/integration_in --provider hdfs

List Oracle Cloud Infrastructure Object Storage Classic content under selected user.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username test20170324113533 --bdm-passwd ${USER_PASSWORD_FILE}    ls oss:///${OSS_CONTAINER}/ --provider ${OSS_PROVIDER}