9 Using the Oracle Big Data Manager bdm-cli Utility

Use the bdm-cli (Oracle Big Data Manager Command Line Interface) utility to copy data and manage copy jobs at the command line.

bdm-cli has several commands that duplicate odcp commands , but bdm-cli also includes additional commands for scheduling and managing copy jobs and other administrative tasks.

You have to download and install bdm-cli yourself, either on a node of the cluster or on a remote operating system. If you install it on your cluster, you must use SSH to connect to the cluster. If you install it on a remote system, you can run the commands without SSH. See Installing the bdm-cli Utility.

There are no special requirements for using bdm-cli when it’s installed outside the cluster.

9.1 Installing the bdm-cli Utility

The bdm-cli (Big Data Command Line Interface) is a command line utility for copying data and managing copy jobs. You can download and install bdm-cli from GitHub. You can install it on a remote operating system, so you don’t have to use SSH to connect to the cluster.

To install bdm-cli:

  1. If you use a proxy server, first call:

    export http_proxy="your_proxy_server" 
    export https_proxy="your_proxy_server"
  2. Then call:

    curl -L https://github.com/jazeman/bdm-python-cli/blob/1.0/install-rpm?raw=true | bash

9.2 Usage

You can use bdm-cli at the command line to create and manage copy jobs.

Syntax

bdm-cli [global_options] subcommand [options][arguments]...

Supported Storage Protocols and Paths

The protocols and paths to the file systems and storage services supported by bdm-cli are:

  • HDFS:

    hdfs:///

  • Oracle Cloud Infrastructure Object Storage Classic (formerly known as Oracle Storage Cloud Service):

    swift://container.provider/

  • Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare Metal Cloud Object Storage Service):

    oss:///container

    For operations with Oracle Cloud Infrastructure Object Storage, you must specify the provider by using the options src-provider and dst-provider. For example, those options are used with bdm-cli create_job when used with Oracle Cloud Infrastructure Object Storage.

Finding a Job’s UUID

A number of bdm-cli subcommands require that you identify a job by its Universally Unique Identifier (UUID). To find UUIDs, execute bdm-cli list_all_jobs.

Specifying Source and Destination Paths

When specifying sources and destinations, fully qualify the paths:

  • source ...

    File name qualified by protocol and full path, for example: hdfs:///user/oracle/test.raw

  • destination

    Directory name qualified by protocol and full path, for example: swift://container.storagename/test-dir

Setting Environment Variables

You can set some bdm-cli options as environment variables. For example, you can set Oracle Big Data Manager URL and user password file, as follows:
export BDM_URL=https://hostname:8888/bdcs/api && export BDM_PASSWORD=/tmp/password_file

All the bdm-cli options that can be set as environment variables are documented in the sections below.

Getting Help

To get help for bdm-cli use:
bdm-cli --help
To get help for a specific command use:
bdm-cli command --help
For example:
bdm-cli edit_job_template --help

9.3 Options

Options that can be used by all bdm-cli commands are explained below.

Option Description
--bdm-passwd path_to_password_file

Path to the Oracle Big Data Manager user password file.

Environment variable: BDM_PASSWORD

--bdm-url bdm_url

Oracle Big Data Manager server URL.

Environment variable: BDM_URL

--bdm-username username

Oracle Big Data Manager server user name.

Default value: oracle

Environment variable: BDM_USERNAME

-f [table|csv|json] Specify the output format:
  • table (default)

    Each field is displayed in a separate column.

  • csv

    Each record is displayed as a comma-separated list on a single line.

  • json:

    The output is displayed in JavaScript Object Notation (JSON) format.

--fields fields

Specifies comma-separated fields depending on the type of object.

-h

--help

Show this message and exit.

--no-check-certificate

Don't validate the server's certificate.

--proxy proxy

Proxy server.

--tenant-name tenant_name

Name of the tenant.

Default value: admin

-v

Print the REST request body.

--version Show the Oracle Big Data Manager version and exit.

9.4 Subcommands

The following table summarizes the bdm-cli subcommands. For more details on each, click the name of the command.

Command Description
bdm-cli abort_job Abort a running job.
bdm-cli copy Execute a job to copy sources to destination.
bdm-cli create_job Execute a new job from an existing template.
bdm-cli create_job_template Create a new job template.
bdm-cli get_data_source Find a data source by name.
bdm-cli get_job Get a job by UUID.
bdm-cli get_job_log Get a job log.
bdm-cli list_all_jobs List all jobs from the execution history.
bdm-cli list_template_executions List all jobs from the execution history for the given template.
bdm-cli ls List files from a specific location.

9.5 bdm-cli abort_job

Abort a running job.

Syntax

bdm-cli abort_job [options] job_uuid

Options

Option Description

--force

Force abort job.

-h

--help

Show this message and exit.

Example

Abort a job.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE} abort_job 24ef30e8-913b-4402-baf8-74b99c211f50

9.6 bdm-cli copy

Execute a job to copy sources to destination.

Syntax

bdm-cli copy [options] source... destination

Options

Option Description

block-size block_size

Specify the block size in bytes.

--description description

Data source description.

--driver-memory-size driver_memory_size

Specify the maximum amount of memory for the Oracle Storage Cloud Service driver.

--dst-provider oss_destination_provider

Specify the provider of the destination, when using Oracle Cloud Infrastructure Object Storage Classic destination.

-h

--help

Show this message and exit.

--memory-size-per-node memory_size_per_node

Specify the Spark executors memory limit in GB per node, for example, 40GB.

--number-of-executor-nodesnumber_of_executors_per_node

Specify the maximum number of Spark executors per node, for example, 10GB.

--number-of-threads—per-nodenumber_of_threads_per_node

Specify the maximum number of threads per node.

part-size part_size

Specify the part size in bytes.

--recursive

--no-recursive

Recursively copy (enabled by default).

--retry

--no-retry

Retry data transfer in case of failure.

--src-provider oss_source_provider

Specify the provider of the source, when using for Oracle Cloud Infrastructure Object Storage Classic.

--sync

--no-sync

Synchronize the source with the destination.

Example

Copy a file from HDFS to Oracle Storage Cloud Service:

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   copy hdfs:///user/${DATA_USER}/1MFile.raw oss:///${DATA_USER} --dst-provider ${OSS_PROVIDER}

9.7 bdm-cli create_job

Execute a new job from an existing template.

Syntax

bdm-cli create_job [options] job_template_name

Options

Option Description

--run-now

Execute job immediately if job scheduling is set. Ignored otherwise.

--source source

Source file, for example:

hdfs:///user/oracle/test.raw

--destination destination

The destination directory, for example: swift://container.storagename/test-dir.

--driver-memory-size driver_memory_size

Specify the maximum amount of memory for an Oracle Storage Cloud Service driver.

--memory-size-per-node memory_size_per_node

Specify the Spark executors memory limit in GB per node, for example: 40G.

--number-of-executor-nodes number_of_executors_per_node

Specify the maximum number of Spark executors per node, for example: 10G.

--number-of-threads-per-node number_of_threads_per_node

Specify the maximum number of threads per node.

--block-size block_size

Specify the block size in bytes.

--part-size part_size

Specify the part size in bytes.

--retry

--no-retry

Retry data transfer in case of failure.

--sync

--no-sync

Synchronize the source with the destination.

--recursive

--no-recursive

Recursively copy (enabled by default).

--job-executable-class job_executable_class

Main Java class used for the Spark job execution.

--src-provider oss_source_provider

Specify the provider of the source, when using an Oracle Cloud Infrastructure Object Storage Classic source.

--dst-provider oss_destination_provider

Specify the provider of the destination, when using an Oracle Cloud Infrastructure Object Storage Classic destination.

-h

--help

Show this message and exit.

9.8 bdm-cli create_job_template

Create a new job template.

Syntax

bdm-cli create_job_template [options] job_template_name source ... destination

Options

Option Description

--abort-running-job

--no-abort-running-job

Abort an already running execution if the next scheduled execution is started.

--block-size block_size

Specify block size in bytes.

--data-source-name data_source_name

Job's data source name.

--description description

Job template description.

--dst-provider destination_provider

Specify for oss:/// destination.

--environment environment

Environment in JSON format:

{"envName1": "envValue2", "envName2": "envValue2"}

-h

--help

Show this message and exit.

--history-size history_size

Count of executions history log.

--job-executable-class job_executable_class

Main Java class used for the Spark job execution.

--job-schedule job_schedule

Specify cron-like job schedule, for example:

"0 56 8 * * ?" means run every day at 08h 56m UTC time.

--job-template-type job_template_type

Specify job template type. Allowed values are:

  • DATA_MOVEMENT_COPY

  • GENERAL

--libraries libraries

Hadoop libraries, for example: OdcpLibraries.

This option can have multiple values, for example:

--libraries OdcpLibraries --libraries OdcpLibraries

--memory-size-per-node memory_size_per_node

Specify the Spark executors memory limit in GB per node, for example: 40G.

--number-of-executor-nodes number_of_executor_per_node

Specify the maximum number of Spark executors per node, for example: 10G.

--number-of-threads-per-node number_of_threads_per_node

Specify the maximum of threads per node.

--part-size part_size

Specify part size in bytes.

--recursive

--no-recursive

Recursively copy (enabled by default).

--retry

--no-retry

Retry data transfer in case of failure.

--src-provider oss_source_provider

Specify the provider of the source, when using for Oracle Bare Metal Cloud Object Storage Service.

--sync

--no-sync

Synchronize source with destination.

--tags tags

User defined tag. This option can have multiple values, for example:

--tags system --tags datamovement --tags copy

9.9 bdm-cli get_data_source

Find a data source by name.

Syntax

bdm-cli get_data_source [options] data_source_name

Options

Option Description

-h

--help

Show this message and exit.

9.10 bdm-cli get_job

Get a job by UUID.

Syntax

bdm-cli get_job [options] job_uuid

Options

Option Description

-h

--help

Show this message and exit.

Example

Get information on a job.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   get_job ${JOB_UUID}

9.11 bdm-cli get_job_log

Get a job log.

Syntax

bdm-cli get_job_log [options] job_uuid

Options

Option Description

-h

--help

Show this message and exit.

9.12 bdm-cli list_all_jobs

List all jobs from the execution history.

Syntax

bdm-cli list_all_jobs [options]

Options

Option Description

-h

--help

Show this message and exit.

--limit limit

Specify the size of the page.

--offset offset

Specify the paging offset.

Example

List all jobs.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   list_all_jobs

Use the --offset and --limit options to restrict the results. For example to get the eighth page when there are 20 rows per page, do the following:

bdm-cli list_all_jobs --offset 8 --limit 20

9.13 bdm-cli list_template_executions

List all jobs from the execution history for the given template.

Syntax

bdm-cli list_template_executions  [options] job_uuid

Options

Option Description

-h

--help

Show this message and exit.

9.14 bdm-cli ls

List files from a specific location.

Syntax

bdm-cli ls [options] path_1 ... path_n

Options

Option Description

-h

-–human-readable

Human readable file sizes.

-d

--dirs-only

List directories only.

--provider oss_provider

Specify for Oracle Bare Metal Cloud Object Storage Service paths.

-h

--help

Show this message and exit.

Examples

List HDFS content under selected user.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASSWORD_FILE}   ls hdfs:///user/${DATA_USER}/integration_in --provider hdfs

List Oracle Cloud Infrastructure Object Storage Classic content under selected user.

/usr/bin/bdm-cli  -f json  --no-check-certificate  --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username test20170324113533 --bdm-passwd ${USER_PASSWORD_FILE}    ls oss:///${OSS_CONTAINER}/ --provider ${OSS_PROVIDER}