5 Copy Data With Oracle Big Data Cloud Service Tools
Oracle Big Data Cloud Service provides a number of tools and features to facilitate data management:
You can use one or a combination of the following to fit your desired workflow.
Oracle Distributed Copy (odcp)
odcp
is a distributed command
line interface (CLI) for copying data sets to and from various storage providers:
-
Oracle Cloud Infrastructure Object Storage Classic (formerly known as Oracle Storage Cloud Service)
-
Hadoop Distributed File System (HDFS)
-
Amazon Simple Storage Service (S3)
-
WebHDFS and Secure WebHDFS (SWebHDF)
-
Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare Metal Cloud Object Storage Service)
-
Hypertext Transfer Protocol (HTTP) and HTTP Secure (HTTPS) — Used for sources only.
odcp
is a Spark application that
can read its configuration from the command line or from the
core-site.xml
file on the cluster. The configuration can
include the storage provider URL, user credentials, login certificate, and
provider-specific configuration details.
Use bda-oss-admin
commands to update the configuration
in the core-site.xml
file (see below).
Big Data Management Command Line Utility (bda-oss-admin)
Use the bda-oss-admin
command line utility to add and update storage providers’ configurations. The
configurations are saved in the core-site.xml
file on the
cluster. With bda-oss-admin, you can perform actions
such as add, list, remove, and update credentials.
Oracle Distributed Diff (odiff)
odiff
is a command line utility
for comparing large data sets stored in HDFS and Oracle Cloud Infrastructure Object Storage Classic. The computation runs as a distributed Spark application.
Oracle Big Data Manager
The Oracle Big Data Manager web application provides visual tools for creating and managing data sources, creating and scheduling data transfer jobs, displaying logs, and a performing a number of other data transfer tasks.
Oracle Big Data Manager uses
odiff
and
for data management tasks.
odcp
Oracle Big Data Manager Command Line Interface (bdm-cli)
bdm-cli
is command line
utility for creating and managing data sources, creating and scheduling data
transfer jobs, displaying logs, and a performing a number of other data transfer
tasks.
You can install bdm-cli
in a
remote operating system. That means you can create and schedule data transfer jobs
from any remote server. You don't have to use SSH to connect to a cluster node to
execute any of these commands (although you can).
See Oracle Big Data Manager Command Line Interface (bdm-cli)
Oracle Big Data Manager SDKs
You can use the Oracle Big Data Manager SDKs to use Oracle Big Data Manager from within applications. See Manage Data and Copy Jobs With the Oracle Big Data Manager SDKs.