5 Oracle R Connector for Hadoop

This chapter describes R support for big data. It contains these topics:

About Oracle R Connector for Hadoop
Scenarios for Using Oracle R Packages
Security Notes for Oracle R Connector for Hadoop
Functions in Alphabetical Order
Functions By Category

5.1 About Oracle R Connector for Hadoop

Oracle R Connector for Hadoop is an R package that provides an interface between the local R environment and Hadoop. You install and load this package the same as you would for any other R package. Using simple R functions, you can copy data between R memory, the local file system, and HDFS. You can schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations.

5.1.1 Oracle R Connector for Hadoop APIs

Oracle R Connector for Hadoop provides API access from a local R client to Hadoop, using these APIs:

hadoop: Provides an interface to Hadoop MapReduce.
hdfs: Provides an interface to HDFS.
orhc: Provides an interface between the local R instance and Oracle Database.

All of these functions are included in the ORHC library. The functions are listed in this chapter in alphabetical order.

5.1.2 Access to Oracle Database

A separate package R package, Oracle R Enterprise, provides access to Oracle Database. Access to the data stored in Oracle Database is always restricted to the access rights granted by your Oracle DBA.

Oracle R Enterprise provides direct access to Oracle Database objects and enables you to perform statistical analysis on database tables, views, and other data objects. Users can develop R scripts for deployment while retaining the results in the secure environment of Oracle Database.

Oracle R Enterprise is included in the Oracle Database Advanced Analytics option; it is not included in Oracle Big Data Connectors.

5.2 Scenarios for Using Oracle R Packages

The following scenario may help you identify opportunities for using Oracle R Connector for Hadoop with Oracle R Enterprise.

Using the Oracle R Connector for Hadoop, you might look for files that you have access to on HDFS and schedule R calculations to execute on data in one such file. Furthermore, you can upload data stored in text files on your local file system into HDFS for calculations, schedule an R script for execution on the Hadoop cluster, and download the results into a local file.

Using the Oracle Database Advanced Analytics option, you can open the R interface and connect to Oracle Database to work on the tables and views that are visible based on your database privileges. You can filter out rows, add derived columns, project new columns, and perform visual and statistical analysis using Oracle R Enterprise.

Again using the Oracle R Connector for Hadoop, you might deploy a MapReduce job on Hadoop for CPU-intensive calculations written in R. The calculation can use data stored in HDFS or with the Oracle Database Advanced Analytics option, in Oracle Database You can return the output of the calculation to Oracle Database and the R console for visualization or additional processing.

5.3 Security Notes for Oracle R Connector for Hadoop

Oracle R Connector for Hadoop invokes the Sqoop utility to connect to Oracle Database either to extract data or to store results. Sqoop is a command-line utility for Hadoop that imports and exports data between HDFS or Hive and structured databases, such as Oracle Database. The name Sqoop comes from "SQL to Hadoop."

The following explains how Oracle R Connector for Hadoop stores a database user password and sends it to Sqoop.

Oracle R Connector for Hadoop stores a user password only when the user establishes the database connection in a mode that does not require reentering the password each time. The password is stored encrypted in memory. See orhc.connect.

Oracle R Connector for Hadoop generates a configuration file for Sqoop and uses it to invoke Sqoop locally. The file contains the user's database password obtained by either prompting the user or from the encrypted in-memory representation. The file has local user access permissions only. The file is created, the permissions are set explicitly, then the file is open for writing and filled with data.

Sqoop uses the configuration file to generate custom JAR files dynamically for the specific database job and passes the JAR files to the Hadoop client software. The password is stored inside the compiled JAR file; it is not stored in plain text.

The JAR file is transferred to the Hadoop cluster over a network connection. The network connection and the transfer protocol is specific to Hadoop, such as port 5900.

The configuration file is deleted after Sqoop finishes compiling its JAR files and starts its own Hadoop jobs.

5.4 Functions in Alphabetical Order

hadoop.exec
hadoop.run
hdfs.attach
hdfs.cd
hdfs.download
hdfs.exists
hdfs.get
hdfs.ls
hdfs.mkdir
hdfs.parts
hdfs.pull
hdfs.push
hdfs.put
hdfs.pwd
hdfs.rm
hdfs.rmdir
hdfs.sample
hdfs.size
hdfs.upload
orhc.connect
orhc.disconnect
orhc.reconnect
orhc.which

5.5 Functions By Category

The OHRC functions are grouped into these categories:

Making Connections
Copying Data
Exploring Files
Executing Scripts

5.5.1 Making Connections

orhc.connect
orhc.disconnect
orhc.reconnect
orhc.which

5.5.2 Copying Data

hdfs.upload
hdfs.download
hdfs.get
hdfs.push
hdfs.put
hdfs.pull

5.5.3 Exploring Files

hdfs.attach
hdfs.cd
hdfs.exists
hdfs.ls
hdfs.mkdir
hdfs.parts
hdfs.pwd
hdfs.rm
hdfs.rmdir
hdfs.sample
hdfs.size

5.5.4 Executing Scripts

hadoop.exec
hadoop.run

hadoop.exec

Starts the Hadoop engine and sends the mapper, reducer and combiner R functions for execution. You must load the data into HDFS first.

Usage

hadoop.exec(
        dfs.id, 
        mapper, 
        reducer, 
        combiner, 
        export)

Arguments

dfs.id: Object identifier in HDFS.
mapper: Name of a mapper function written in the R language.
reducer: Name of a reducer function written in the R language (optional).
combiner: Name of a combiner function written in the R language (optional).
export: Names of exported R objects from your current R environment that are referenced by any of your mapper, reducer, or combiner functions (optional).

Usage Notes

This function provides more control of the data flow than hadoop.run. You must use hadoop.exec when chaining several mappers and reducers in a pipeline, because the data does not leave HDFS. The results are stored in HDFS.

Return Value

Data object identifier in HDFS.

Example

This sample script uses hdfs.attach to obtain the object identifier of a small, sample data file in HDFS named ontime_R.

dfs <- hdfs.attach('ontime_R')
res <- NULL
res <- hadoop.exec(
    dfs,
    mapper = function(key, ontime) {
        if (key == 'SFO') {
            keyval(key, ontime)
        }
    },
    reducer = function(key, vals) {
        sumAD <- 0
        count <- 0
        for (x in vals) {
           if (!is.na(x$ARRDELAY)) {sumAD <- sumAD + x$ARRDELAY; count <- count + 1}
        }
        res <- sumAD / count
        keyval(key, res)
    }
)

After the script runs, the location of the results is identified by the res variable, in an HDFS file named /user/oracle/xq/orhc3d0b8218:

R> res
[1] "/user/oracle/xq/orhc3d0b8218"
attr(,"dfs.id")
[1] TRUE
R> print(hdfs.get(res))
  key     val1
1 SFO 27.05804

hadoop.run

Starts the Hadoop engine and sends the mapper, reducer and combiner R functions for execution. If the data is not already stored in HDFS, then hadoop.run first copies the data there.

Usage

hadoop.run(
        data, 
        mapper, 
        reducer, 
        combiner, 
        export)

Arguments

data: Data frame, Oracle R Enterprise frame (ore.frame), or an HDFS file descriptor.
mapper: Name of a mapper function written in the R language.
reducer: Name of a reducer function written in the R language (optional).
combiner: Name of a combiner function written in the R language (optional).
export: Names of exported R objects.

Usage Notes

The hadoop.run function returns the results from HDFS to the source of the input data. For example, the results for HDFS input data are kept in HDFS, and the results for ore.frame input data are pulled into Oracle Database.

Return Value

An object in the same format as the input data.

Example

This sample script uses hdfs.attach to obtain the object identifier of a small, sample data file in HDFS named ontime_R.

dfs <- hdfs.attach('ontime_R')
res <- NULL
res <- hadoop.run(
    dfs,
    mapper = function(key, ontime) {
        if (key == 'SFO') {
            keyval(key, ontime)
        }
    },
    reducer = function(key, vals) {
        sumAD <- 0
        count <- 0
        for (x in vals) {
           if (!is.na(x$ARRDELAY)) {sumAD <- sumAD + x$ARRDELAY; count <- count + 1}
        }
        res <- sumAD / count
        keyval(key, res)
    }
)

After the script runs, the location of the results is identified by the res variable, in an HDFS file named /user/oracle/xq/orhc3d0b8218:

R> res
[1] "/user/oracle/xq/orhc3d0b8218"
attr(,"dfs.id")
[1] TRUE
R> print(hdfs.get(res))
  key     val1
1 SFO 27.05804

hdfs.attach

Pulls data from an unstructured data file in HDFS into the Oracle R Connector for Hadoop framework. By default, data files in HDFS are not visible to the R Connector. However, if you know the name of the data file, you can use this function to attach it to the R Connector name space.

If the data does not have metadata identifying the names and data types of the columns, then the function samples the data to deduce the data type (number or string). It then re-creates the file with the appropriate metadata.

Usage

hdfs.attach(dfs.name)

Arguments

dfs.name: The name of a file in HDFS.

Usage Notes

Use this function to attach an HDFS file to your R environment, the same as you might attach a data frame.

Return Value

The object ID of the file in HDFS, or NULL if the operation failed.

Example

This example stores the object ID of ontime_R in a variable named dfs, then displays its value.

R> dfs <- hdfs.attach('ontime_R')
R> dfs
[1] "/user/oracle/xq/ontime_R"
attr(,"dfs.id")
[1] TRUE

hdfs.cd

Sets the default HDFS path.

Usage

hdfs.cd(dfs.path)

Arguments

dfs.path: A path that is either absolute or relative to the current path.

Return Value

TRUE if the path is changed successfully, or FALSE if the operation failed.

Example

This example changes the current directory from /user/oracle to /user/oracle/sample:

R> hdfs.cd("sample")
[1] "/user/oracle/sample"

hdfs.download

Copies a file from HDFS to the local file system.

Usage

hdfs.download(
        dfs.id,
        filename, 
        overwrite)

Arguments

dfs.id: The object ID of the file in HDFS.
filename: The name of a file in the local file system where the data is copied.
overwrite: Controls whether the operation can overwrite an existing local file. Set to TRUE to overwrite filename, or FALSE to signal an error (default).

Usage Notes

This function provides the fastest and easiest way to copy a file from HDFS. No data transformations occur except merging multiple parts into a single file. The local file has the exact same data as the HDFS file.

Return Value

Local file name, or NULL if the copy failed.

Example

This example displays a list of files in the current HDFS directory and copies ontime2000.DB to the local file system as /home/oracle/ontime2000.dat.

R> hdfs.ls()
[1] "ontime2000_DB" "ontime_DB"     "ontime_File"   "ontime_R"      "testdata.dat" 
R> tmpfile <- hdfs.download("ontime2000_DB", "/home/oracle/ontime2000.dat", overwrite=F)
R> tmpfile
[1] "/home/oracle/ontime2000.dat"

hdfs.exists

Verifies that an object exists in HDFS.

Usage

hdfs.exists(
        dfs.id)

Arguments

dfs.id: An object ID or file name in HDFS.

Usage Notes

If this function returns TRUE, then you can attach the data and use it in a hadoop.run function. You can also use this function to validate an HDFS identifier and ensure that the data exists.

Return Value

TRUE if the identifier is valid and the data exists, or FALSE if the object is not found.

Example

This example shows that the ontime_R file exists.

R> hdfs.exists("ontime_R")
[1] TRUE

hdfs.get

Copies data from HDFS into a data frame in the local R environment. All metadata is extracted and all attributes, such as column names and data types, are restored if the data originated in an R environment. Otherwise, generic attributes like val1 and val2 are assigned.

Usage

hdfs.get(
        dfs.id,
        sep)

Arguments

dfs.id: The object ID of the file in HDFS.
sep: The symbol used to separate fields in the file. A comma (,) is the default separator.

Usage Notes

If the HDFS file is small enough to fit into an in-memory R data frame, then you can copy the file using this function instead of hdfs.pull. The hdfs.get function can be faster, because it does not use Sqoop and thus does not have the overhead incurred by hdfs.pull.

Return Value

A data.frame object in memory in the local R environment pointing to the exported data set, or NULL if the operation failed.

Example

This example returns the contents of a data frame named res.

R> print(hdfs.get(res))
   key      val1
1   AA 1361.4643
2   AS  515.8000
3   CO 2507.2857
4   DL 1601.6154
5   HP  549.4286
6   NW 2009.7273
7   TW 1906.0000
8   UA 1134.0821
9   US 2387.5000
10  WN  541.1538

hdfs.ls

Lists the names of all HDFS directories containing data in the specified path.

Usage

hdfs.ls(dfs.path)

Arguments

dfs.path: A path relative to the current default path. The default path is the current working directory.

Usage Notes

Use hdfs.cd to set the default path.

Return Value

A list of data object names in HDFS, or NULL if the specified path is invalid.

Example

This example lists the subdirectories in the current directory:

R> hdfs.ls()
[1] "ontime_DB"   "ontime_FILE"   "ontime_R"

The next example lists directories in the parent directory:

R> hdfs.ls("..")
[1] "demo"   "input"   "olhcache"   "output"   "sample"   "xq"

This example returns NULL because the specified path is not in HDFS.

R> hdfs.ls("/bin")
NULL

hdfs.mkdir

Creates a subdirectory in HDFS relative to the current working directory.

Usage

hdfs.mkdir(
        dfs.name,
        cd)

Arguments

dfs.name: Name of the new directory.
cd: TRUE to change the current working directory to the new subdirectory, or FALSE to keep the current working directory (default).

Usage Notes

Text.

Return Value

Full path of the new directory as a String, or NULL if the directory was not created.

Example

This example creates the /user/oracle/sample directory.

R> hdfs.mkdir('sample', cd=T)
[1] "/user/oracle/sample"
attr(,"dfs.path")
[1] TRUE

hdfs.parts

Returns the number of parts composing an object in HDFS.

Usage

hdfs.parts(
        dfs.id)

Arguments

dfs.id: Object identifier in HDFS.

Usage Notes

HDFS splits large files into parts, which provide a basis for the parallelization of MapReduce jobs. The more parts an HDFS file has, the more mappers can run in parallel.

Return Value

Number of parts composing the object, or 0 if the object does not exist in HDFS

Example

This example shows that the ontime_R file has one part:

R> hdfs.parts("ontime_R")
[1] 1

hdfs.pull

Copies data from HDFS into Oracle Database.

This operation requires authentication by Oracle Database. See orhc.connect.

Usage

hdfs.pull(
        dfs.id,
        sep,
        db.name,
        overwrite,
        driver)

Arguments

dfs.id: The file name in HDFS.
sep: The symbol used to separate fields in the file. A comma (,) is the default separator.
db.name: The name of a table in Oracle Database. (Optional)
overwrite: Controls whether db.name can overwrite a table with the same name. Set to TRUE to overwrite the table, or FALSE to signal an error (default).
driver: Identifies the driver used to copy the data. The default driver is sqoop.

Usage Notes

Because this operation is synchronous, copying a large data set may appear to hang the R environment. You regain use of R when copying is complete.

To copy large volumes of data into Oracle Database, consider using Oracle Loader for Hadoop. With the Oracle Database Advanced Analytics option, you can use Oracle R Enterprise to analyze the data in an Oracle database.

Return Value

An ore.frame object that points to the database table with data loaded from HDFS, or NULL if the operation failed

See Also

Oracle R Enterprise User's Guide for a description of ore.frame objects.

hdfs.push

Copies data from Oracle Database to HDFS.

This operation requires authentication by Oracle Database. See orhc.connect.

Note:

The Oracle R Enterprise (ORE) library must be attached to use this function.

Usage

hdfs.push(
        x,
        key,
        dfs.name,
        overwrite,
        driver,
        split.by)

Arguments

x: An ore.frame object with the data in Oracle Database to be pushed.
key: The index or name of the key column.
dfs.name: Unique name for the object in HDFS.
overwrite: TRUE to allow dfs.name to overwrite an object with the same name, or FALSE to signal an error (default).
driver: Driver for copying the data (optional). The default driver is sqoop.
split.by: The column to use for data partitioning. (Optional)

Usage Notes

Because this operation is synchronous, copying a large data set may appear to hang the R environment. You regain use of R when copying is complete.

An ore.frame object is an Oracle R Enterprise metadata object that points to a database table. It corresponds to an R data.frame object.

Return Value

HDFS object ID pointing to the exported data set, or NULL if the operation failed

See Also

Oracle R Enterprise User's Guide

Example

This examples creates an ore.frame object named ontime_s2000 that contains the rows from the ONTIME_S table in Oracle Database where the year equals 2000. Then hdfs.push uses ontime_s2000 to create /user/oracle/xq/ontime2000_DB in HDFS.

R> ontime_s2000 <- ONTIME_S[ONTIME_S$YEAR == 2000,]
R> class(ontime_s2000)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> ontime2000.dfs <- hdfs.push(ontime_s2000, key='DEST', dfs.name='ontime2000_DB')
R> ontime2000.dfs
[1] "/user/oracle/xq/ontime2000_DB"
attr(,"dfs.id")
[1] TRUE

hdfs.put

Copies data from an ORE data frame to HDFS. Column names, data types, and other attributes are stored as metadata in HDFS.

Note:

The Oracle R Enterprise (ORE) library must be attached to use this function.

Usage

hdfs.put(
        data,
        key,
        dfs.name, 
        overwrite)

Arguments

data: An ore.frame object in the local R environment to be copied to HDFS.
key: The index or name of the key column.
dfs.name: A unique name for the new file.
overwrite: Controls whether dfs.name can overwrite a file with the same name. Set to TRUE to overwrite the file, or FALSE to signal an error.

Usage Notes

You can use this function to transfer control parameters or to look up data relevant to a Hadoop R calculation from the R environment into an HDFS file.

You can also use hdfs.put instead of hdfs.push to copy data from ore.frame objects, such as database tables, to HDFS. The table must be small enough to fit in R memory, otherwise the function fails. The hdfs.put function first reads all table data into R memory and then transfers it to HDFS. For a small table, this function can be faster because it does not use Sqoop and thus does not have the overhead incurred by hdfs.push.

Return Value

The object ID of the new file, or NULL if the operation failed.

Example

This example creates a file named /user/oracle/xq/testdata.dat with the contents of the dat data frame.

R> myfile <- hdfs.put(dat, key='DEST', dfs.name='testdata.dat')
R> print(myfile)
[1] "/user/oracle/xq/testdata.dat"
attr(,"dfs.id")
[1] TRUE

hdfs.pwd

Identifies the current working directory in HDFS.

Usage

hdfs.pwd()

Return Value

The current working directory, or NULL if you are not connected to HDFS.

Example

This example shows that /user/oracle is the current working directory.

R> hdfs.pwd()
[1] "/user/oracle/"

hdfs.rm

Removes a file or directory from HDFS.

Usage

hdfs.rm(dfs.id)

Arguments

dfs.id: The object ID of a file in HDFS to be removed.

Usage Notes

All object identifiers in Hadoop pointing to this data are invalid after this operation.

Return Value

TRUE if the data is deleted, or FALSE if the operation failed.

Example

R> hdfs.rm("data1.log")
[1] TRUE

hdfs.rmdir

Deletes a subdirectory in HDFS relative to the current working directory.

Usage

hdfs.rmdir(
        dfs.name)

Arguments

dfs.name: Name of the directory in HDFS to delete.

Usage Notes

This function deletes all data objects stored in the directory, which invalidates all associated object identifiers in HDFS.

Return Value

TRUE if the directory is deleted successfully, or FALSE if the operation fails.

Example

R> hdfs.rmdir("mydata")
[1] TRUE

hdfs.sample

Copies a random sample of data from a Hadoop file into an R in-memory object. Use this function to copy a small sample of the original HDFS data for developing the R calculation that you ultimately want to execute on the entire HDFS data set on the Hadoop cluster.

Usage

hdfs.sample(
        dfs.id,
        lines,
        sep)

Arguments

dfs.id: HDFS object ID where the data is located.
lines: Number of lines to return as a sample. The default value is 1000 lines.
sep: The symbol used to separate fields in the Hadoop file. A comma (,) is the default separator.

Usage Notes

If the data originated in an R environment, then all metadata is extracted and all attributes are restored, including column names and data types. Otherwise, generic attribute names, like val1 and val2, are assigned.

Return Value

A data.frame object with the sample data set, or NULL if the operation failed.

Example

This example displays the first three lines of the ontime_R file.

R> hdfs.sample("ontime_R", lines=3)
  YEAR MONTH MONTH2 DAYOFMONTH DAYOFMONTH2 DAYOFWEEK DEPTIME...
1 2000    12     NA         31          NA         7     1730...
2 2000    12     NA         31          NA         7     1752...
3 2000    12     NA         31          NA         7     1803...

hdfs.size

Returns the size in bytes of an object in HDFS.

Usage

hdfs.size(
        dfs.id)

Arguments

dfs.id: Object identifier in HDFS.

Usage Notes

Use this interface to determine, for instance, whether you can pull the contents of the entire HDFS file into local R memory or a local file, or if you can only sample the data while creating a prototype of your R calculation.

Return Value

Size in bytes of the object, or 0 if the object does not exist in HDFS

Example

This example returns a file size for ontime_R of 999,839 bytes.

R> hdfs.size("ontime_R")
[1] 999839

hdfs.upload

Copies a file from the local file system into HDFS.

Usage

hdfs.upload(
        filename,
        dfs.name, 
        overwrite,
        split.size,
        header)

Arguments

filename

Name of a file in the local file system.

dfs.name

Name of the new directory in HDFS.

overwrite

Controls whether db.name can overwrite a directory with the same name. Set to TRUE to overwrite the directory, or FALSE to signal an error (default).

split.size

Maximum number of bytes in each part of the Hadoop file. (Optional)

header

Indicates whether the first line of the local file is a header containing column names. Set to TRUE if it has a header, or FALSE if it does not (default).

A header enables you to exact the column names and reference the data fields by name instead of by index in your MapReduce R scripts.

Usage Notes

This function provides the fastest and easiest way to copy a file into HDFS. If the file is larger than split.size, then Hadoop splits it into two or more parts. The new Hadoop file gets a unique object ID, and each part is named part-0000x. Hadoop automatically creates metadata for the file.

Return Value

HDFS object ID for the loaded data, or NULL if the copy failed.

See Also

hdfs.download
hdfs.get
hdfs.put

Example

This example uploads a file named ontime_s2000.dat into HDFS and shows the location of the file, which is stored in a variable named ontime.dfs_File.

R> ontime.dfs_File <- hdfs.upload('ontime_s2000.dat', dfs.name='ontime_File')
R> print(ontime.dfs_File)
[1] "/user/oracle/xq/ontime_File"

orhc.connect

Establishes a connection to Oracle Database.

Usage

orhc.connect(
        host,
        user,
        sid,
        passwd,
        port, 
        secure,
        driver,
        silent)

Arguments

host

Host name or IP address of the server where Oracle Database is running.

user

Database user name.

passwd

Password for the database user.

sid

System ID (SID) for the Oracle Database instance.

port

Port number for the Oracle Database listener. The default value is 1521.

secure

Authentication setting for Oracle Database:

TRUE: You must enter a database password each time you attempt to connect. (Default)
FALSE: You enter a database password once. It is encrypted in memory and used every time a database connection is required.

driver

Driver used to connect to Oracle Database (optional). Sqoop is the default driver.

silent

TRUE to suppress the prompts for missing host, user, password, port, and SID values, or FALSE to see them (default).

Usage Notes

Use this function when your analysis requires access to data stored in an Oracle database or to return the results to the database.

With an Oracle Database Advanced Analytics license for Oracle R Enterprise and a connection to Oracle Database, you can work directly with the data stored in database tables and pass processed data frames to R calculations on Hadoop.

Return Value

TRUE for a successful and validated connection, or FALSE for a failed connection attempt

See Also

orhc.disconnect

Example

This example installs the OHRC library and connects to the local Oracle database:

R> library(ORHC)
Oracle R Connector for HadoopHadoop is up and running.
R> orhc.connect("localhost", 'RQUSER", "orcl")
Connecting ORCH to RDBMS via [sqoop]
    Host: localhost
    Port: 1521
    SID: orcl
    User: RQUSER
Enter password for [RQUSER]: password
Connected.
[1] TRUE

orhc.disconnect

Disconnects the local R session from Oracle Database.

Usage

orhc.disconnect()

Usage Notes

No orhc functions work without a connection to Oracle Database.

You can use the return value of this function to reestablish a connection using orhc.reconnect.

Return Value

An Oracle Database connection object, or NULL if Oracle Database refuses to disconnect

See Also

orhc.connect
orhc.reconnect

Example

R> orhc.disconnect()
Disconnected.

orhc.reconnect

Reconnects to Oracle Database with the credentials previously returned by orhc.disconnect.

Usage

orhc.reconnect(dbcon)

Arguments

dbcon: Credentials previously returned by orhc.disconnect.

Usage Notes

Oracle R Connector for Hadoop preserves all user credentials and connection attributes, enabling you to reconnect to a previously disconnected session. Depending on the orhc.connect secure setting for the original connection, you may be prompted for a password. After reconnecting, you can continue data transfer operations between Oracle Database and HDFS.

Reconnecting to a session is faster than opening a new one, because reconnecting does not require extensive connectivity checks.

Return Value

TRUE for a successfully reestablished and validated connection, or FALSE for a failed attempt

See Also

orhc.connect
orhc.disconnect

orhc.which

Displays information about the current connection to Oracle Database, excluding the authentication credentials.

Usage

orhc.which()

Return Value

None

Usage Notes

This function is useful when connecting to multiple Oracle databases during your analysis task.

Example

This example describes a connection by RQUSER to the local Oracle database:

R> orhc.which()
Connected to RDBMS via [sqoop]
    Host: localhost
    Port: 1521
    SID: orcl
    User: RQUSER