hdfs.attach

Copies data from an unstructured data file in HDFS into the Oracle R Connector for Hadoop framework. By default, data files in HDFS are not visible to the connector. However, if you know the name of the data file, you can use this function to attach it to the Oracle R Connector for Hadoop name space.

Usage

hdfs.attach(
        dfs.name,
        force)

Arguments

dfs.name

The name of a file in HDFS.

force

Controls whether the function attempts to discover the structure of the file and the data type of each column.

FALSE for comma-separated value (CSV) files (default). If a file does not have metadata identifying the names and data types of the columns, then the function samples the data to deduce the data type as number or string. It then re-creates the file with the appropriate metadata.

TRUE for non-CVS files, including binary files. This setting prevents the function from trying to discover the metadata; instead, it simply attaches the file.

Usage Notes

Use this function to attach a CSV file to your R environment, just as you might attach a data frame.

Oracle R Connector for Hadoop does not support the processing of attached non-CVS files. Nonetheless, you can attach a non-CSV file, download it to your local computer, and use it as desired. Alternatively, you can attach the file for use as input to a Hadoop application.

This function can become slow when processing large input HDFS files, as the result of inherited limitations in the Hadoop command-line interface.

Return Value

The object ID of the file in HDFS, or NULL if the operation failed

See Also

hdfs.download

Example

This example stores the object ID of ontime_R in a variable named dfs, and then displays its value.

R> dfs <- hdfs.attach('ontime_R')
R> dfs
[1] "/user/oracle/xq/ontime_R"
attr(,"dfs.id")
[1] TRUE