Copies a random sample of data from a Hadoop file into an R in-memory object. Use this function to copy a small sample of the original HDFS data for developing the R calculation that you ultimately want to execute on the entire HDFS data set on the Hadoop cluster.
The name of a file in HDFS. The file name can include a path that is either absolute or relative to the current path.
The number of lines to return as a sample. The default value is 1000 lines.
The symbol used to separate fields in the Hadoop file. A comma (
,) is the default separator.
If the data originated in an R environment, then all metadata is extracted and all attributes are restored, including column names and data types. Otherwise, generic attribute names, like
val2, are assigned.
This function can become slow when processing large input HDFS files, as the result of inherited limitations in the Hadoop command-line interface.