hdfs.pull

Copies data from HDFS into an Oracle database.

This operation requires authentication by Oracle Database. See orch.connect.

Usage

hdfs.pull(
        dfs.id,
        sep,
        db.name,
        overwrite,
        driver)

Arguments

dfs.id

The name of a file in HDFS. The file name can include a path that is either absolute or relative to the current path.

sep

The symbol used to separate fields in the file (optional). A comma (,) is the default separator.

db.name

The name of a table in an Oracle database.

overwrite

Controls whether db.name can overwrite a table with the same name. Set to TRUE to overwrite the table, or FALSE to signal an error (default).

driver

Identifies the driver used to copy the data: Sqoop (default) or olh to use Oracle Loader for Hadoop. You must set up Oracle Loader for Hadoop before using it as a driver. See the Usage Notes and "Oracle Loader for Hadoop Setup.".

Usage Notes

With the Oracle Advanced Analytics option, you can use Oracle R Enterprise to analyze the data after loading it into an Oracle database.

Choosing a Driver

Sqoop is synchronous, and copying a large data set may take a while. The prompt reappears and you regain use of R when copying is complete.

Oracle Loader for Hadoop is much faster than Sqoop, and so you should use it as the driver if possible.

Correcting Problems With the OLH Driver

If Oracle Loader for Hadoop is available, then you see this message when the ORCH library is loading:

OLH 2.0.0 is up

If you do not see this message, then Oracle Loader for Hadoop is not installed properly. Check that these environment variables are set correctly:

  • OLH_HOME: Set to the installation directory

  • HADOOP_CLASSPATH: Includes $OLH_HOME/jlib/*

  • CLASSPATH: Includes $OLH_HOME/jlib/*

If hdfs.pull fails and HADOOP_CLASSPATH is set correctly, then the version of Oracle Loader for Hadoop may be incorrect for the version of CDH. Check the Oracle Loader for Hadoop log file.

See Also:

"Oracle Loader for Hadoop Setup" for installation instructions

Return Value

An ore.frame object that points to the database table with data loaded from HDFS, or NULL if the operation failed

See Also

Oracle R Enterprise User's Guide for a description of ore.frame objects.