hdfs.put

Copies data from an R in-memory object (data.frame) to HDFS. All data attributes, like column names and data types, are stored as metadata with the data.

Usage

hdfs.put(
        data,
        key,
        dfs.name,
        overwrite,
        rownames)

Arguments

data

An ore.frame object in the local R environment to be copied to HDFS.

key

The index or name of the key column.

dfs.name

A unique name for the new file.

overwrite

Controls whether dfs.name can overwrite a file with the same name. Set to TRUE to overwrite the file, or FALSE to signal an error.

rownames

Set to TRUE to add a sequential number to the beginning of each line of the file, or FALSE otherwise.

Usage Notes

You can use hdfs.put instead of hdfs.push to copy data from ore.frame objects, such as database tables, to HDFS. The table must be small enough to fit in R memory; otherwise, the function fails. The hdfs.put function first reads all table data into local R memory and then transfers it to HDFS. For a small table, this function can be faster than hdfs.push because it does not use Sqoop and thus does not have the overhead incurred by hdfs.push.

Return Value

The object ID of the new file, or NULL if the operation failed

Example

This example creates a file named /user/oracle/xq/testdata.dat with the contents of the dat data frame.

R> myfile <- hdfs.put(dat, key='DEST', dfs.name='testdata.dat')
R> print(myfile)
[1] "/user/oracle/xq/testdata.dat"
attr(,"dfs.id")
[1] TRUE