hdfs.upload

Copies a file from the local file system into HDFS.

Usage

hdfs.upload(
        filename,
        dfs.name, 
        overwrite,
        split.size,
        header)

Arguments

filename

Name of a file in the local file system.

dfs.name

Name of the new directory in HDFS.

overwrite

Controls whether dfs.name can overwrite a directory with the same name. Set to TRUE to overwrite the directory, or FALSE to signal an error (default).

split.size

Maximum number of bytes in each part of the Hadoop file (optional).

header

Indicates whether the first line of the local file is a header containing column names. Set to TRUE if it has a header, or FALSE if it does not (default).

A header enables you to exact the column names and reference the data fields by name instead of by index in your MapReduce R scripts.

Usage Notes

This function provides the fastest and easiest way to copy a file into HDFS. If the file is larger than split.size, then Hadoop splits it into two or more parts. The new Hadoop file gets a unique object ID, and each part is named part-0000x. Hadoop automatically creates metadata for the file.

Return Value

HDFS object ID for the loaded data, or NULL if the copy failed

Example

This example uploads a file named ontime_s2000.dat into HDFS and shows the location of the file, which is stored in a variable named ontime.dfs_File.

R> ontime.dfs_File <- hdfs.upload('ontime_s2000.dat', dfs.name='ontime_File')
R> print(ontime.dfs_File)
[1] "/user/oracle/xq/ontime_File"