Copies a random sample of data from a Hadoop file into an R in-memory object. Use this function to copy a small sample of the original HDFS data for developing the R calculation that you ultimately want to execute on the entire HDFS data set on the Hadoop cluster.
The name of a file in HDFS. The file name can include a path that is either absolute or relative to the current path.
The number of lines to return as a sample. The default value is 1000 lines.
The symbol used to separate fields in the Hadoop file. A comma (,
) is the default separator.
If the data originated in an R environment, then all metadata is extracted and all attributes are restored, including column names and data types. Otherwise, generic attribute names, like val1
and val2
, are assigned.
This function can become slow when processing large input HDFS files, as the result of inherited limitations in the Hadoop command-line interface.