hdfs.push

Copies data from an Oracle database to HDFS.

This operation requires authentication by Oracle Database. See orch.connect.

Usage

hdfs.push(
        x,
        key,
        dfs.name,
        overwrite,
        driver,
        split.by)

Arguments

x

An ore.frame object with the data in an Oracle database to be pushed.

key

The index or name of the key column.

dfs.name

Unique name for the object in HDFS.

overwrite

TRUE to allow dfs.name to overwrite an object with the same name, or FALSE to signal an error (default).

driver

Identifies the driver used to copy the data. This argument is currently ignored because Sqoop is the only supported driver.

split.by

The column to use for data partitioning (required).

Usage Notes

Because this operation is synchronous, copying a large data set may take a while. The prompt reappears and you regain use of R when copying is complete.

An ore.frame object is an Oracle R Enterprise metadata object that points to a database table. It corresponds to an R data.frame object.

If you omit the split.by argument, then hdfs.push might import only a portion of the data into HDFS.

Return Value

The full path to the file that contains the data set, or NULL if the operation failed

Example

This example creates an ore.frame object named ontime_s2000 that contains the rows from the ONTIME_S database table in where the year equals 2000. Then hdfs.push uses ontime_s2000 to create /user/oracle/xq/ontime2000_DB in HDFS.

R> ontime_s2000 <- ONTIME_S[ONTIME_S$YEAR == 2000,]
R> class(ontime_s2000)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> ontime2000.dfs <- hdfs.push(ontime_s2000, key='DEST', dfs.name='ontime2000_DB', 'split.by='YEAR')
R> ontime2000.dfs
[1] "/user/oracle/xq/ontime2000_DB"
attr(,"dfs.id")
[1] TRUE