5 Copy Data With Oracle Big Data Cloud Service Tools

Oracle Big Data Cloud Service provides a number of tools and features to facilitate data management:

You can use one or a combination of the following to fit your desired workflow.

Oracle Distributed Copy (odcp)

odcp is a distributed command line interface (CLI) for copying data sets to and from various storage providers:
  • Oracle Cloud Infrastructure Object Storage Classic (formerly known as Oracle Storage Cloud Service)

  • Hadoop Distributed File System (HDFS)

  • Amazon Simple Storage Service (S3)

  • WebHDFS and Secure WebHDFS (SWebHDF)

  • Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare Metal Cloud Object Storage Service)

  • Hypertext Transfer Protocol (HTTP) and HTTP Secure (HTTPS) — Used for sources only.

odcp is a Spark application that can read its configuration from the command line or from the core-site.xml file on the cluster. The configuration can include the storage provider URL, user credentials, login certificate, and provider-specific configuration details.

Use bda-oss-admin commands to update the configuration in the core-site.xml file (see below).

See Use the odcp Command Line Utility to Copy Data

Big Data Management Command Line Utility (bda-oss-admin)

Use the bda-oss-admin command line utility to add and update storage providers’ configurations. The configurations are saved in the core-site.xml file on the cluster. With bda-oss-admin, you can perform actions such as add, list, remove, and update credentials.

See Use bda-oss-admin to Manage Storage Resources.

Oracle Distributed Diff (odiff)

odiff is a command line utility for comparing large data sets stored in HDFS and Oracle Cloud Infrastructure Object Storage Classic. The computation runs as a distributed Spark application.

See Use odiff to Compare Large Data Sets.

Oracle Big Data Manager

The Oracle Big Data Manager web application provides visual tools for creating and managing data sources, creating and scheduling data transfer jobs, displaying logs, and a performing a number of other data transfer tasks.

Oracle Big Data Manager uses odiff and odcp for data management tasks.

See About Oracle Big Data Manager.

Oracle Big Data Manager Command Line Interface (bdm-cli)

bdm-cli is command line utility for creating and managing data sources, creating and scheduling data transfer jobs, displaying logs, and a performing a number of other data transfer tasks.

You can install bdm-cli in a remote operating system. That means you can create and schedule data transfer jobs from any remote server. You don't have to use SSH to connect to a cluster node to execute any of these commands (although you can).

See Oracle Big Data Manager Command Line Interface (bdm-cli)

Oracle Big Data Manager SDKs

You can use the Oracle Big Data Manager SDKs to use Oracle Big Data Manager from within applications. See Manage Data and Copy Jobs With the Oracle Big Data Manager SDKs.