2 Getting Started with Oracle R Enterprise

This chapter describes how to start using Oracle R Enterprise by connecting to an Oracle Database instance and creating Oracle R Enterprise objects and storing them in the database.

This chapter discusses these topics:

Connecting to an Oracle Database Instance

To use Oracle R Enterprise, you first connect to an Oracle Database instance as described in the following topics:

About Connecting to the Database

Oracle R Enterprise client components connect an R session to an Oracle Database instance and the Oracle R Enterprise server components. The connection makes the data in a database schema available to the R user. It also makes the processing power, memory, and storage capacities of the database server available to the R session through the Oracle R Enterprise client interface.

This section has the following topics:

About Using the ore.connect Function

To begin using Oracle R Enterprise, you first connect to a schema in an Oracle Database instance with the ore.connect function. Only one Oracle R Enterprise connection can exist at a time during an R session. If an R session is already connected to the database, then invoking ore.connect terminates the active connection before opening a new connection. Before attempting to connect, you can discover whether an active connection exists by using the ore.is.connected function.

You explicitly end a connection with the ore.disconnect function. If you do not invoke ore.disconnect, then the connection is automatically terminated when the R session ends. For more information on ore.disconnect, see "About Using the ore.disconnect Function."

With the type argument of ore.connect, you specify the type of connection, either ORACLE or HIVE. A HIVE type of connection connects to Hive tables in a Hadoop cluster. An ORACLE type of connection connects to a schema in an Oracle Database instance. The default value of type is "ORACLE".

If the connection type is HIVE, then ore.connect ignores all other arguments. For information on Oracle R Connector for Hadoop and Hive, see Oracle Big Data Connectors User's Guide. The HIVE option applies only if you are using Oracle R Advanced Analytics for Hadoop (ORAAH) in conjunction with a Hadoop cluster. ORAAH is part of the Oracle Big Data Connectors option to the Big Data Appliance.

If the connection type is ORACLE, then you do the following:

  • Use the logical all argument to specify whether Oracle R Enterprise automatically creates an ore.frame object for each table to which the user has access in the schema and makes those ore.frame objects visible in the current R session. The ore.frame objects contain metadata about the tables. The default value of the all argument is FALSE.

    If all = TRUE, then Oracle R Enterprise implicitly invokes the ore.sync and ore.attach functions. If all = FALSE, then the user must explicitly invoke ore.sync to create ore.frame objects. To access these objects by name, the user must invoke ore.attach to include the names in the search path. For information on those functions, see "Creating R Objects for In-Database Data".

  • Use either the conn_string argument, or various combinations of the user, sid, host, password, port, service_name, and conn_string arguments to specify information that identifies the connection.

    To avoid using a clear-text password, you can specify an Oracle wallet password with the conn_string argument. No other arguments are needed. By specifying an Oracle wallet password, you can avoid embedding a database user password in application code, batch jobs, or scripts. For information on creating an Oracle wallet, see Oracle R Enterprise Installation and Administration Guide.

    With the other connection identifier arguments, you specify a database user name, host name, and password, and either a system identifier (SID) or service name, and, optionally, a TCP port, or you specify a database user name, password, and a conn_string argument.

    The default value of the port argument is 1521, the default value of host is "localhost", which specifies the local host, and the default value of conn_string is NULL. You specify the local host when your R session is running on the same computer as the Oracle Database instance to which you want to connect.

See Also:

About Using the ore.disconnect Function

To explicitly end the connection between an R session and the Oracle Database instance, invoke the ore.disconnect function. Oracle R Enterprise implicitly invokes ore.disconnect if you do either of the following:

  • Quit the R session.

  • Invoke ore.connect while an Oracle R Enterprise connection is already active.

When you disconnect the active connection, Oracle R Enterprise discards all Oracle R Enterprise objects that you have not explicitly saved in an Oracle R Enterprise datastore. For information on saving objects, see "Saving and Managing R Objects in the Database".

Using the ore.connect and ore.disconnect Functions

The examples in this section demonstrate the various ways of specifying an Oracle R Enterprise connection to an Oracle Database instance. The examples use sample values for the ore.connect argument values. Replace the sample values with the appropriate values for connecting to your database.

Example 2-1 invokes the ore.connect function and specifies the user, sid, host, password, and port arguments.

Example 2-1 Using ore.connect and Specifying a SID

ore.connect(user = "rquser", sid = "sales", host = "sales-server",
            password = "rquserStrongPassword", port = 1521 )

Example 2-2 demonstrates using a service name rather than a SID. It also specifies connecting to the local host.

Example 2-2 Using ore.connect and Specifying a Service Name

ore.connect(user = "rquser", host = "localhost", 
            password = "rquserStrongPassword",
            service_name = "sales.example.com")

Example 2-3 uses the conn_string argument to specify an easy connect string that identifies the connection.

Example 2-3 Using ore.connect and Specifying an Easy Connect String

ore.connect(user = "rquser", password = "rquserStrongPassword", 
            conn_string = "sales-server:1521:sales
              (ADDRESS=(PROTOCOL=tcp) (HOST=sales-server) (PORT=1521))
              (CONNECT_DATA=(SERVICE_NAME=sales.example.com)))")

Example 2-4 uses the conn_string argument to specify a full connection string that identifies the connection.

Example 2-4 Using ore.connect and Specifying a Full Connection String

ore.connect(user = "rquser", password = "rquserStrongPassword", 
            conn_string = "DESCRIPTION=
              (ADDRESS=(PROTOCOL=tcp) (HOST=sales-server) (PORT=1521))
              (CONNECT_DATA=(SERVICE_NAME=myserver.example.com)))")

Example 2-5 uses the conn_string argument to specify an Oracle wallet. The mydb_test string is the connection identifier for the Oracle database. The Oracle wallet contains the information needed to create the connection. For information on creating an Oracle wallet for an Oracle R Enterprise connection, see Oracle R Enterprise Installation and Administration Guide.

Example 2-5 Using the conn_string Argument to Specify an Oracle Wallet

ore.connect(conn_string = "mydb_test")

Example 2-6 uses an empty connection string to connect to the local host.

Example 2-6 Using the conn_string Argument and Specifying an Empty Connection String

ore.connect(user = "rquser", password = "rquserStrongPassword", conn_string = "")

Example 2-7 connects to a pluggable database using the conn_string argument to specify a service name.

Example 2-7 Using the conn_string Argument in Connecting to a Pluggable Database

ore.connect(conn_string = "pdb1.example.com")

Example 2-8 invokes ore.connect using a service name, host name, and port number to connect to a pluggable database.

Example 2-8 Using the service_name Argument in Connecting to a Pluggable Database

ore.connect(service_name = "pdb1.example.com", host = "mypdb", port = 1521)

Example 2-9 explicitly disconnects an Oracle R Enterprise session from an Oracle database.

Example 2-9 Disconnecting an Oracle R Enterprise Session

ore.disconnect()

Creating and Managing R Objects in Oracle Database

With transparency layer functions you can connect to an Oracle Database instance and interact with data structures in a database schema. You can move data to and from the database and create database tables. You can also save R objects in the database. The Oracle R Enterprise functions that perform these actions are described in the following topics.

Creating R Objects for In-Database Data

Using Oracle R Enterprise, you can create R proxy objects in your R session from database-resident data as described in the following topics.

About Creating R Objects for Database Objects

When you invoke ore.connect in an R session, Oracle R Enterprise creates a connection to a schema in an Oracle Database instance. To gain access to the data in the database tables in the schema, you use the ore.sync function. That function creates an ore.frame object that is a proxy for a table in a schema. You can use the ore.attach function to add an R environment that represents a schema to the R search path. For information on connecting to the database, see "Connecting to an Oracle Database Instance".

When you use the ore.sync function to create an ore.frame object as a proxy for a database table, the name of the ore.frame proxy object is the same as the name of the database object. Each ore.frame proxy object contains metadata about the corresponding database object.

You can use the proxy ore.frame object to select data from the table. When you execute an R operation that selects data from the table, the operation returns the current data from the database object. However, if some application has added a column to the table, or has otherwise changed the metadata of the database object, the ore.frame proxy object does not reflect such a change until you again invoke ore.sync for the database object.

If you invoke the ore.sync function with no tables specified, and if the value of the all argument was FALSE in the ore.connect function call that established the connection to the Oracle database instance, then the ore.sync function creates a proxy object for each table in the schema specified by ore.connect. You can use the table argument to specify the tables for which you want to create ore.frame proxy objects.

Tip:

To conserve memory resources and save time, you should only add proxies for the tables that you want to use in your R session.

With the schema argument, you can specify the schema for which you want to create an R environment and proxy objects. Only one environment for a given database schema can exist at a time. With the use.keys argument, you can specify whether you want to use primary keys in the table to order the ore.frame object.

Tip:

Ordering is expensive in the database. Because most operations in R do not need ordering, you should generally set use.keys to FALSE unless you need ordering for sampling data or some other purpose. For more information on ordering, see "Creating Ordered and Unordered ore.frame Objects".

With the query argument, you can specify a SQL SELECT statement. This enables you to create an ore.frame for a query without creating a view in the database. This can be useful when you not have the CREATE VIEW system privilege for the current schema. You cannot use the schema argument and the query argument in the same ore.sync invocation.

You can use the ore.ls function to list the ore.frame proxy objects that correspond to database tables in the environment for a schema. You can use the ore.exists function to find out if an ore.frame proxy object for a database table exists in an R environment. The function returns TRUE if the proxy object exists or FALSE if it does not. You can remove an ore.frame proxy object from an R environment with the ore.rm function.

Using the ore.sync Function

Example 2-10 demonstrates the use of the ore.sync function. The example first invokes the ore.exec function to create some tables to represent tables existing in the database schema for the rquser. The example then invokes ore.sync and specifies three tables of the schema. The ore.sync invocation creates an R environment for the rquser schema and creates proxy ore.frame objects for the specified tables in that schema. The example lists the ore.frame proxy objects in the current environment. The TABLE3 table exists in the schema but does not have an ore.frame proxy object because it was not included in the ore.sync invocation.

The example next invokes ore.sync with the query argument to create ore.frame objects for the specified SQL queries. The example lists the ore.frame objects again.

The example then invokes ore.sync again and creates an R environment for the SH schema and proxy objects in that environment for the specified tables in that schema. The example invokes the ore.exists function to find out if the specified table exists in the current environment and then in the SH environment. The example lists the R objects in the SH environment.

The example next removes the ore.frame objects QUERY1, QUERY2, and TABLE4 from the rquser environment. Finally, the example lists the proxy objects in the environment again.

Note:

The ore.rm function invocation removes the ore.frame that is a proxy for the TABLE4 table from the environment. It does not delete the table from the schema.

Example 2-10 Using ore.sync to Add ore.frame Proxy Objects to an R Environment

# After connecting to a database as rquser, create some tables.
ore.exec("CREATE TABLE TABLE1 AS SELECT * FROM dual")
ore.exec("CREATE TABLE TABLE2 AS SELECT * FROM dual")
ore.exec("CREATE TABLE TABLE3 AS SELECT * FROM dual")
ore.exec("CREATE TABLE TABLE4 AS SELECT * FROM dual")
# Create ore.frame objects for the specified tables.
ore.sync(table = c("TABLE1", "TABLE3", "TABLE4"))
# List the ore.frame proxy objects in the current environment.
ore.ls()
# Create ore.frame objects for the specified queries.
ore.sync(query = c("QUERY1" = "SELECT 0 X, 1 Y FROM dual",
                   "QUERY2" = "SELECT 1 X, 0 Y FROM dual"))
ore.ls()
# The rquser user has been granted SELECT permission on the tables in the 
# SH schema.
ore.sync("SH", table = c("CUSTOMERS", "SALES"))
# Find out if the CUSTOMERS ore.frame exists in the rquser environment.
ore.exists("CUSTOMERS")
# Find out if it exists in the SH environment.
ore.exists("CUSTOMERS", schema = "SH")
# List the ore.frame proxy objects in the SH environment.
ore.ls("SH")
# Remove the ore.frame objects for the specified objects.
ore.rm(c("QUERY1", "QUERY2", "TABLE4"))
# List the ore.frame proxy objects in the current environment again.
ore.ls()

Listing for Example 2-10

R> # After connecting to a database as rquser, create some tables.
R> ore.exec("CREATE TABLE TABLE1 AS SELECT * FROM dual")
R> ore.exec("CREATE TABLE TABLE2 AS SELECT * FROM dual")
R> ore.exec("CREATE TABLE TABLE3 AS SELECT * FROM dual")
R> ore.exec("CREATE TABLE TABLE4 AS SELECT * FROM dual")
R> # Create ore.frame objects for the specified tables.
R> ore.sync(table = c("TABLE1", "TABLE3", "TABLE4"))
R> # List the ore.frame proxy objects in the current environment.
R> ore.ls()
 [1] "TABLE1"     "TABLE3"     "TABLE4"
R> # Create ore.frame objects for the specified queries.
R> ore.sync(query = c("QUERY1" = "SELECT 0 X, 1 Y FROM dual",
+                     "QUERY2" = "SELECT 1 X, 0 Y FROM dual"))
R> ore.ls()
 [1] "QUERY1"     "QUERY2"     "TABLE1"     "TABLE3"     "TABLE4"
R> # The rquser user has been granted SELECT permission on the tables in the 
R> # SH schema.
R> ore.sync("SH", table = c("CUSTOMERS", "SALES"))
R> # Find out if the CUSTOMERS ore.frame exists in the rquser environment.
R> ore.exists("CUSTOMERS")
[1] FALSE
R> # Find out if it exists in the SH environment.
R> ore.exists("CUSTOMERS", schema = "SH")
[1] TRUE
R> # List the ore.frame proxy objects in the SH environment.
R> ore.ls("SH")
[1] "CUSTOMERS" "SALES"
R> # Remove the ore.frame objects for the specified objects.
R> ore.rm(c("QUERY1", "QUERY2", "TABLE4"))
R> # List the ore.frame proxy objects in the current environment again.
R> ore.ls()
 [1] "TABLE1"     TABLE3"

Using the ore.get Function

After you have created an R environment and ore.frame proxy objects with ore.sync, you can get a proxy object by name with the ore.get function, as shown in Example 2-11. The example invokes the ore.sync function to create an ore.frame object that is a proxy for the CUSTOMERS table in the SH schema. The example then gets the dimensions of the proxy object.

Example 2-11 Using ore.get to Get a Database Table

ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE)
dim(ore.get(name = "CUSTOMERS", schema = "SH"))

Listing for Example 2-11

R> ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE)
R> dim(ore.get(name = "CUSTOMERS", schema = "SH"))
[1] 630  15

You can use ore.get to get the proxy ore.frame for a table and assign it to a variable in R, as in SH_CUST <- ore.get(name = "CUSTOMERS", schema = "SH"). The ore.frame exists in the R global environment, which can be referred to using .GlobalEnv, and so it appears in the list returned by the ls function. Also, because this object exists in the R global environment, as opposed an R environment that represents a database schema, it is not listed by the ore.ls function.

Using the ore.attach Function

With ore.attach, you add an R environment for a database schema to the R search path. When you add the R environment, you have access to database tables by name through the proxy objects created by the ore.sync function without needing to specify the schema environment.

The default schema is the one specified in creating the connection and the default position in the search path is 2. You can specify the schema and the position in the ore.attach function invocation.. You can also specify whether you want the ore.attach function to indicate whether a naming conflict occurs when adding the environment. You can detach the environment for a schema from the R search path with the ore.detach function.

Example 2-12 demonstrates the use of the ore.attach function. Comments in the example explain the function invocations.

Example 2-12 Using ore.attach to Add an Environment for a Database Schema

# Connected as rquser.
# Add the environment for the rquser schema to the R search path.
ore.attach()
# Create an unordered ore.frame proxy object in the SH environment for the
# specifed table.
ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE)
# Add the environment for the SH schema to the search path and warn if naming
# conflicts exist.
ore.attach("SH", 3, warn.conflicts = TRUE)
# Display the number of rows and columns in the proxy object for the table.
dim(CUSTOMERS)
# Remove the environment for the SH schema from the search path.
ore.detach("SH")
# Invoke the dim function again.
dim(CUSTOMERS)

Listing for Example 2-12

R> # Connected as rquser.
R> # Add the environment for the rquser schema to the R search path.
R> ore.attach()
R> # Create an unordered ore.frame proxy object in the SH environment for the
R> # specifed table.
R> ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE)
R> # Add the environment for the SH schema to the search path and warn if naming
R> # conflicts exist.
R> ore.attach("SH", 3, warn.conflicts = TRUE)
R> # Display the number of rows and columns in the proxy object for the table.
R> dim(CUSTOMERS)
[1] 630  15
R> # Remove the environment for the SH schema from the search path.
R> ore.detach("SH")
R> # Invoke the dim function again.
R> dim(CUSTOMERS)
Error: object 'CUSTOMERS' not found

Creating Ordered and Unordered ore.frame Objects

Oracle R Enterprise provides the ability to create ordered or unordered ore.frame objects. The following topics describe this feature.

About Ordering in ore.frame Objects

R objects such as vector and data.frame have an implicit ordering of their elements. The data in an Oracle Database table is not necessarily ordered. For some R operations, ordering is useful whereas for other operations it is unnecessary. By ordering an ore.frame, you are able to index the ore.frame object by using either integer or character indexes.

Using an ordered ore.frame object that is a proxy for a SQL query can be time-consuming for a large data set. Therefore, although Oracle R Enterprise attempts to create ordered ore.frame objects by default, it also provides the means of creating an unordered ore.frame object.

When you invoke the ore.sync function to create an Oracle R Enterprise ore.frame object as a proxy for a SQL query, you can use the use.keys argument to specify whether the ore.frame can be ordered or must be unordered.

An ore.frame object can be ordered if one or more of the following conditions are true:

  • The value of the use.keys argument of the ore.sync function is TRUE and a primary key is defined on the underlying table

  • The row names of the ore.frame constitute a unique tuple

  • The ore.frame object is produced by certain functions such as aggregate and cbind

  • All of the ore.frame objects that are input arguments to relevant Oracle R Enterprise functions are ordered

An ore.frame object is unordered if one or more of the following conditions are true:

  • The value of the use.keys argument of the ore.sync function is FALSE

  • No primary key is defined on the underlying table and either the row names of the ore.frame object are not specified or the row names of the ore.frame object are set to NULL

  • One or more of the ore.frame objects that are input arguments to relevant Oracle R Enterprise functions are unordered

An unordered ore.frame object has null row names. You can determine whether an ore.frame object is ordered by invoking is.null on the row names of the objects, as shown in the last lines of Example 2-13. If the ore.frame object is unordered, is.null returns an error.

See Also:

"Indexing Data"

Global Options Related to Ordering

Oracle R Enterprise has options that relate to the ordering of an ore.frame object. The ore.warn.order global option specifies whether you want Oracle R Enterprise to display a warning message if you use an unordered ore.frame object in a function that requires ordering. If you know what to expect in an operation, then you might want to turn the warnings off so they do not appear in the output. For examples of the warning messages, see Example 2-13 and Example 2-14.

You can see what the current setting is, or turn the option on or off, as in the following example.

R> options("ore.warn.order")
$ore.warn.order
[1] TRUE
R> options("ore.warn.order" = FALSE)
R> options("ore.warn.order" = TRUE)

With the ore.sep option, you can specify the separator between the row name values that you use for multi-column keys, as in the following example.

R> options("ore.sep")
$ore.sep
[1] "|"

R> options("ore.sep" = "/")
R> options("ore.sep" = "|")

Ordering Using Keys

You can use the primary key of a database table to order an ore.frame object, as demonstrated in Example 2-13. The example loads the spam data set from the kernlab package. It adds two columns to the data set.

The example invokes ore.drop to drop the named tables if they exist. It then invokes ore.create to create two tables from the data set. It invokes ore.exec to make the USERID and TS columns a composite primary key of the SPAM_PK table, and invokes ore.sync to synchronize the table with its ore.frame proxy.

Note:

The ore.exec function executes a SQL statement in the Oracle Database schema. The function is intended for database definition language (DDL) statements that have no return value.

Example 2-13 then displays the first eight rows of each table.The proxy object for the SPAM_PK table is an ordered ore.frame object. It has row names that are a combination of the TS and USERID column values separated by the "|" character. The proxy object for the SPAM_NOPK table is an unordered ore.frame object that has the symbol SPAM_NOPK. By default, SPAM_NOPK has row names that are sequential numbers.

The example invokes the ore.exec function to execute a SQL statement that makes the RID column the primary key of the database table.

Example 2-13 Ordering Using Keys

# Prepare the data.
library(kernlab)
data(spam)
s <- spam
# Create a column that has integer values.
s$TS <- 1001:(1000 + nrow(s))
# Create a column that has integer values with each number repeated twice.
s$USERID <- rep(351:400, each=2, len=nrow(s))
# Ensure that the database tables do not exist.
ore.drop(table='SPAM_PK')
ore.drop(table='SPAM_NOPK')
# Create database tables.
ore.create(s[,c(59:60,1:28)], table="SPAM_PK")
ore.create(s[,c(59:60,1:28)], table="SPAM_NOPK")
# Using a SQL statement, alter the SPAM_PK table to add a composite primary key.
ore.exec("alter table SPAM_PK add constraint SPAM_PK primary key
           (\"USERID\",\"TS\")")
# Synchronize the table to get the change to it.
ore.sync(table = "SPAM_PK")
# View the data in the tables.
# The row names of the ordered SPAM_PK are the primary key column values.
head(SPAM_PK[,1:8])
# The row names of the unordered SPAM_NOPK are sequential numbers.
# The first warning results from the inner accessing of SPAM_NOPK to subset
# the columns. The second warning is for the invocation of the head
# function on that subset.
head(SPAM_NOPK[,1:8])
# Verify that SPAM_NOPK is unordered.
is.null(row.names(SPAM_NOPK))

Listing for Example 2-13

R> # Prepare the data.
R> library(kernlab)
R> data(spam)
R> s <- spam
R> # Create a column that has integer values.
R> s$TS <- 1001:(1000 + nrow(s))
R> # Create a column that has integer values with each number repeated twice.
R> s$USERID <- rep(351:400, each=2, len=nrow(s))
R> # Ensure that the database tables do not exist.
R> ore.drop(table='SPAM_PK')
R> ore.drop(table='SPAM_NOPK')
R> # Create database tables.
R> ore.create(s[,c(59:60,1:28)], table="SPAM_PK")
R> ore.create(s[,c(59:60,1:28)], table="SPAM_NOPK")
R> # Using a SQL statement, alter the SPAM_PK table to add a composite primary key.
R> ore.exec("alter table SPAM_PK add constraint SPAM_PK primary key
+  (\"USERID\",\"TS\")")
R> # Synchronize the table to get the change to it.
R> ore.sync(table = "SPAM_PK")
R> # View the data in the tables.
R> # The row names of the ordered SPAM_PK are the primary key column values.
R> head(SPAM_PK[,1:8])
           TS USERID make address  all num3d  our over
1001|351 1001    351 0.00    0.64 0.64     0 0.32 0.00
1002|351 1002    351 0.21    0.28 0.50     0 0.14 0.28
1003|352 1003    352 0.06    0.00 0.71     0 1.23 0.19
1004|352 1004    352 0.00    0.00 0.00     0 0.63 0.00
1005|353 1005    353 0.00    0.00 0.00     0 0.63 0.00
1006|353 1006    353 0.00    0.00 0.00     0 1.85 0.00
R> # The row names of the unordered SPAM_NOPK are sequential numbers.
R> # The first warning results from the inner accessing of SPAM_NOPK to subset
R> # the columns. The second warning is for the invocation of the head
R> # function on that subset.
R> head(SPAM_NOPK[,1:8])
    TS USERID make address  all num3d  our over
1 1001    351 0.00    0.64 0.64     0 0.32 0.00
2 1002    351 0.21    0.28 0.50     0 0.14 0.28
3 1003    352 0.06    0.00 0.71     0 1.23 0.19
4 1004    352 0.00    0.00 0.00     0 0.63 0.00
5 1005    353 0.00    0.00 0.00     0 0.63 0.00
6 1006    353 0.00    0.00 0.00     0 1.85 0.00
Warning messages:
1: ORE object has no unique key - using random order 
2: ORE object has no unique key - using random order
R> # Verify that SPAM_NOPK is unordered.
R> is.null(row.names(SPAM_NOPK))
Error: ORE object has no unique key

Ordering Using Row Names

You can use row names to order an ore.frame object, as demonstrated in Example 2-14. The example creates a data.frame object in the local R session memory and pushes it to the ore.frame object with the symbol a, which exists in the memory of the Oracle database to which the R session is connected. The example shows that the ore.frame object has the default row names of the R data.frame object. Because the ore.frame object is ordered, invoking the row.names function on it does not produce a warning message.

The example uses the ordered SPAM_PK and unordered SPAM_NOPK ore.frame objects from Example 2-13 to show that invoking row.names on the unordered SPAM_NOPK produces a warning message but invoking it on the ordered SPAM_PK does not.

The SPAM_PK object is ordered by the row names, which are the combined values of the TS and USERID column values separated by the "|" character. The example shows that you can change the row names.

Example 2-14 Ordering Using Row Names

# Create an ordered ore.frame by default.
a <- ore.push(data.frame(a=c(1:10,10:1), b=letters[c(1:10,10:1)]))
# Display the values in the b column. Note that because the ore.frame is
# ordered, no warnings appear.
a$b
# Display the default row names for the first six rows of the a column.
row.names(head(a))
# SPAM_NOPK has no unique key, so row.names raises error messages.
row.names(head(SPAM_NOPK))
# Row names consist of TS &rsquor;|' USERID.
# For display on this page, only the first four row names are shown.
row.names(head(SPAM_PK))
# Reassign the row names to the TS column only
row.names(SPAM_PK) <- SPAM_PK$TS
# The row names now correspond to the TS values only.
row.names(head(SPAM_PK[,1:4]))
head(SPAM_PK[,1:4])

Listing for Example 2-14

R> # Create an ordered ore.frame by default.
R> a <- ore.push(data.frame(a=c(1:10,10:1), b=letters[c(1:10,10:1)]))
R> # Display the values in the b column. Note that because the ore.frame is
R> # ordered, no warnings appear.
R> a$b
 [1] a b c d e f g h i j j i h g f e d c b aLevels: a b c d e f g h i j
R> # Display the default row names for the first six rows of the a column.
R> row.names(head(a))
[1] 1 2 3 4 5 6
R> # SPAM_NOPK has no unique key, so row.names raises error messages.
R> row.names(head(SPAM_NOPK))
Error: ORE object has no unique key
In addition: Warning message:
ORE object has no unique key - using random order
R> # Row names consist of TS &rsquor;|' USERID.
R> # For display on this page, only the first four row names are shown.
R> row.names(head(SPAM_PK))
        1001|351         1002|351         1003|352         1004|352
"1001|3.51E+002" "1002|3.51E+002" "1003|3.52E+002" "1004|3.52E+002"
R> # Reassign the row names to the TS column only
R> row.names(SPAM_PK) <- SPAM_PK$TS
R> # The row names now correspond to the TS values only.
R> row.names(head(SPAM_PK[,1:4]))
[1] 1001 1002 1003 1004 1005 1006
R> head(SPAM_PK[,1:4])
       TS USERID make address
1001 1001    351 0.00    0.64
1002 1002    351 0.21    0.28
1003 1003    352 0.06    0.00
1004 1004    352 0.00    0.00
1005 1005    353 0.00    0.00
1006 1006    353 0.00    0.00

Using Ordered Frames

Example 2-15 example uses the ordered SPAM_PK and unordered SPAM_NOPK ore.frame objects from Example 2-13 to show the result of merging two ordered ore.frame objects and two unordered ore.frame objects.

Example 2-15 Merging Ordered and Unordered ore.frame Objects

# Create objects for merging data from unordered ore.frame objects.
x <- SPAM_NOPK[,1:4]
y <- SPAM_NOPK[,c(1,2,4,5)]
m1 <- merge(x, y, by="USERID")
# The merged result m1 produces a warning because it is not an ordered frame.
head(m1,3)
# Create objects for merging data from ordered ore.frame objects.
x <- SPAM_PK[,1:4]
y <- SPAM_PK[,c(1,2,4,5)]
# The merged result m1 does not produce a warning now because it is an 
# ordered frame.
m1 <- merge(x, y, by="USERID")
head(m1,3)

Listing for Example 2-15

R> # Create objects for merging data from unordered ore.frame objects.
R> x <- SPAM_NOPK[,1:4]
R> y <- SPAM_NOPK[,c(1,2,4,5)]
R> m1 <- merge(x, y, by="USERID")
R> # The merged result m1 produces a warning because it is not an ordered frame.
R> head(m1,3)
  USERID TS.x make address.x TS.y address.y  all
1    351 5601 0.00         0 1001      0.64 0.64
2    351 5502 0.00         0 1001      0.64 0.64
3    351 5501 0.78         0 1001      0.64 0.64
Warning messages:
1: ORE object has no unique key - using random order 
2: ORE object has no unique key - using random order
R> # Create objects for merging data from ordered ore.frame objects.
R> x <- SPAM_PK[,1:4]
R> y <- SPAM_PK[,c(1,2,4,5)]
R> # The merged result m1 does not produce a warning now because it is an 
R> # ordered frame.
R> m1 <- merge(x, y, by="USERID")
R> head(m1,3)
          USERID TS.x make address.x TS.y address.y  all
1001|1001    351 1001    0      0.64 1001      0.64 0.64
1001|1002    351 1001    0      0.64 1002      0.28 0.50
1001|1101    351 1001    0      0.64 1101      0.00 0.00

Moving Data to and from the Database

You can create a temporary database table, and corresponding proxy ore.frame object, from a local R object with the ore.push function. You can create a local R object that contains a copy of data represented by an Oracle R Enterprise proxy object with the ore.pull function.

The ore.push function translates an R object into an Oracle R Enterprise object of the appropriate data type. The ore.pull function takes an ore class object and returns an R object. If the input object is an ore.list, the ore.pull function creates a data.frame and translates each the data of each database column into the appropriate R representation.

Note:

You can pull data to a local R data.frame only if the data can fit into the R session memory. Also, even if the data fits in memory but is still very large, you may not be able to perform many, or any, R functions in the client R session.

Example 2-16 demonstrates pushing an R data.frame object to the database as a temporary database table with an associated ore.frame object, iris_of, then creating another ore.frame object, iris_of_setosa, by selecting one column from iris_of, and then pulling the iris_of_setosa object into the local R session memory as a data.frame object. The example displays the class of some of the objects.

Example 2-16 Using ore.push and ore.pull to Move Data

class(iris)
# Push the iris data frame to the database.
iris_of <- ore.push(iris)
class(iris_of)
# Display the data type of the Sepal.Length column in the data.frame.
class(iris$Sepal.Length)
# Display the data type of the Sepal.Length column in the ore.frame.
class(iris_of$Sepal.Length)
# Filter one column of the data set.
iris_of_setosa <- iris_of[iris_of$Species == "setosa", ]
class(iris_of_setosa)
# Pull the selected column into the local R client memory.
local_setosa = ore.pull(iris_of_setosa)
class(local_setosa)

Listing for Example 2-16

R> class(iris)
[1] "data.frame"
R> # Push the iris data frame to the database.
R> iris_of <- ore.push(iris)
R> class(iris_of)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> # Display the data type of the Sepal.Length column in the data.frame.
R> class(iris$Sepal.Length)
[1] "numeric"
R> # Display the data type of the Sepal.Length column in the ore.frame.
R> class(iris_of$Sepal.Length)
[1] "ore.numeric"
attr(,"package")
[1] "OREbase"
R> # Filter one column of the data set.
R> iris_of_setosa <- iris_of[iris_of$Species == "setosa", ]
R> class(iris_of_setosa)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> # Pull the selected column into the local R client memory.
R> local_setosa = ore.pull(iris_of_setosa)
R> class(local_setosa)
[1] "data.frame"

Unless you explicitly save them, the temporary database tables and their corresponding Oracle R Enterprise proxy objects that you create with the ore.push function are discarded when you quit the R session.

See Also:

Creating and Deleting Database Tables

You can use the ore.create function to create a persistent table in an Oracle Database schema. Creating the table automatically creates an ore.frame proxy object for the table in the R environment that represents your database schema. The proxy ore.frame object has the same name as the table. You can delete the persistent table in an Oracle Database schema with the ore.drop function.

Caution:

Only use the ore.drop function to delete a database table and its associated ore.frame proxy object. Never use it to remove an ore.frame object that is not associated with a permanent database table. To remove an ore.frame object for a temporary database table, use the ore.rm function.

Example 2-17 creates tables in the database and drops some of them.

Example 2-17 Using ore.create and ore.drop to Create and Drop Tables

# Create the AIRQUALITY table from the data.frame for the airquality data set.
ore.create(airquality, table = "AIRQUALITY")
# Create data.frame objects.
df1 <- data.frame(x1 = 1:5, y1 = letters[1:5])
df2 <- data.frame(x2 = 5:1, y2 = letters[11:15])
# Create the DF1 and DF2 tables from the data.frame objects.
ore.create(df1, "DF1")
ore.create(df2, "DF2")
# Create the CARS93 table from the data.frame for the Cars93 data set.
ore.create(Cars93, table = "CARS93")
# List the Oracle R Enterprise proxy objects.
ore.ls()
# Drop the CARS93 object.
ore.drop(table = "CARS93")
# List the Oracle R Enterprise proxy objects again.
ore.ls()

Listing for Example 2-17

R> # Create the AIRQUALITY table from the data.frame for the airquality data set.
R> ore.create(airquality, table = "AIRQUALITY")
R> # Create data.frame objects.
R> df1 <- data.frame(x1 = 1:5, y1 = letters[1:5])
R> df2 <- data.frame(x2 = 5:1, y2 = letters[11:15])
R> # Create the DF1_TABLE and DF2_TABLE tables from the data.frame objects.
R> ore.create(df1, "DF1")
R> ore.create(df2, "DF2")
R> # Create the CARS93 table from the data.frame for the Cars93 data set.
R> ore.create(Cars93, table = "CARS93")
R> # List the Oracle R Enterprise proxy objects.
R> ore.ls()
[1] "AIRQUALITY"  "CARS93"  "DF1"  "DF2_"
R> # Drop the CARS93 object.
R> ore.drop(table = "CARS93")
R> # List the Oracle R Enterprise proxy objects again.
R> ore.ls()
[1] "AIRQUALITY"  "DF1_"  "DF2"

Saving and Managing R Objects in the Database

Oracle R Enterprise provides datastores that you can use to save Oracle R Enterprise proxy objects, as well as any R object, in an Oracle database. You can restore the saved objects in another R session. The objects in a datastore are also accessible to embedded R execution through both the R and the SQL interfaces.

This section describes the Oracle R Enterprise functions that you can use to create and manage datastores. The section contains the following topics:

About Persisting Oracle R Enterprise Objects

R objects, including Oracle R Enterprise proxy objects, exist for the duration of the current R session unless you explicitly save them. The standard R functions for saving and restoring R objects, save and load, serialize objects in R memory to store them in a file and deserialize them to restore them in memory. However, for Oracle R Enterprise proxy objects, those functions do not save the database objects associated with the proxy objects in an Oracle database; therefore the saved proxy objects do not behave properly in a different R session.

You can save Oracle R Enterprise proxy objects, as well as any R object, with the ore.save function. The ore.save function specifies an Oracle R Enterprise datastore. A datastore persists in the database when you end the R session. The datastore maintains the referential integrity of the objects it contains. Using the ore.load function, you can restore in another R session the objects in the datastore.

Using a datastore, you can do the following:

  • Save Oracle R Enterprise and other R objects that you create in one R session and restore them in another R session.

  • Pass arguments to R functions for use in embedded R execution.

  • Pass objects for use in embedded R execution. You could, for example, use a function in the OREdm package to build an Oracle Data Mining model and save it in a datastore. You could then use that model to score data in the database through embedded R execution. For an example of using a datastore in an embedded R execution function, see Example 6-10.

Table 2-1 lists the functions that manipulate datastores and provides brief descriptions of them.

Table 2-1 Functions that Manipulate Datastores

Function Description

ore.save

Saves R objects in a new or existing datastore.

ore.load

Restores objects from a datastore into an R environment.

ore.lazyLoad

Lazily restores objects from a datastore into an R environment.

ore.delete

Deletes a datastore from the current Oracle database schema.

ore.datastore

Lists information about a datastore in the current Oracle database schema.

ore.datastoreSummary

Provides detailed information about the specified datastore in the current Oracle database schema.


See Also:

Chapter 6, "Using Oracle R Enterprise Embedded R Execution" for information on using the R and the SQL interfaces to embedded R execution

About Oracle R Enterprise Datastores

Each database schema has a table that stores named Oracle R Enterprise datastores. A datastore can contain Oracle R Enterprise objects and standard R objects.

You create a datastore with the ore.save function. When you create a datastore, you specify a name for it. You can save objects in one or more datastores.

As long as a datastore contains an Oracle R Enterprise proxy object for a database object, the database object persists between R sessions. For example, you could use the ore.odmNB function in the OREdm package to build an Oracle Data Mining Naive Bayes model. If you save the resulting ore.odmNB object in a datastore and end the R session, then Oracle Database does not delete the Oracle Data Mining model. If no datastore contains the ore.odmNB object and the R session ends, then the database automatically drops the model.

Saving Objects to a Datastore

The ore.save function saves one or more R objects in the specified datastore. By default, Oracle R Enterprise creates the datastore in the current user schema. With the arguments to ore.save, you can provide the names of specific objects, or provide a list of objects. You can specify a particular R environment to search for the objects you would like to save. The overwrite and append arguments are mutually exclusive. If you set the overwrite argument to TRUE, then you can replace an existing datastore with another datastore of the same name. If you set the append argument to TRUE, then you can add objects to an existing datastore. With the description argument, you can provide some descriptive text that appears when you get information about the datastore. The description argument has no effect when used with the append argument.

Example 2-18 demonstrates creating datastores using different combinations of arguments.

Example 2-18 Saving Objects and Creating a Datastore

# Create some R objects.
df1 <- data.frame(x1 = 1:5, y1 = letters[1:5])
df2 <- data.frame(x2 = 5:1, y2 = letters[11:15])
iris_of <- ore.push(iris)

# Create a database table and an Oracle R Enterprise proxy object for the table.
ore.drop("AIRQUALITY")
ore.create(airquality, table = "AIRQUALITY")

# List the R objects.
ls()

# List the Oracle R Enterprise proxy objects.
ore.ls()

# Save the proxy object and all objects in the current workspace environment
# to the datastore named ds1 and supply a description.
ore.save(AIRQUALITY, list = ls(), name = "ds1", description = "My datastore")

# Create some more objects.
x <- stats::runif(20)  # x is an object of type numeric.
y <- list(a = 1, b = TRUE, c = "hoopsa")
z <- ore.push(x)  # z is an object of type ore.numeric.

# Create another datastore.
ore.save(x, y, name = "ds2", description = "x and y")

# Overwrite the contents of datastore ds2.
ore.save(x, name = "ds2", overwrite = TRUE, description = "only x")

# Append object z to datastore ds2.
ore.save(z, name = "ds2", append = TRUE)

Listing for Example 2-18

R> # Create some R objects.
R> df1 <- data.frame(x1 = 1:5, y1 = letters[1:5])
R> df2 <- data.frame(x2 = 5:1, y2 = letters[11:15])
R> iris_of <- ore.push(iris)
R> 
R> # Create a database table and an Oracle R Enterprise proxy object for the table.
R> ore.drop("AIRQUALITY")
R> ore.create(airquality, table = "AIRQUALITY")
R> 
R> # List the R objects.
R> ls()
[1] "df1"     "df2"     "iris_of"
R> 
R> # List the Oracle R Enterprise proxy objects.
R> ore.ls()
[1] "AIRQUALITY"
R> 
R> # Save the proxy object and all objects in the current workspace environment
R> # to the datastore named ds1 and supply a description.
R> ore.save(AIRQUALITY, list = ls(), name = "ds1", description = "My datastore")
R> 
R> # Create some more objects.
R> x <- stats::runif(20)  # x is an object of type numeric.
R> y <- list(a = 1, b = TRUE, c = "hoopsa")
R> z <- ore.push(x)  # z is an object of type ore.numeric.
R> 
R> # Create another datastore.
R> ore.save(x, y, name = "ds2", description = "x and y")
R> 
R> # Overwrite the contents of datastore ds2.
R> ore.save(x, name = "ds2", overwrite = TRUE, description = "only x")
R> 
R> # Append object z to datastore ds2.
R> ore.save(z, name = "ds2", append = TRUE)

Getting Information about Datastore Contents

You can get information about a datastore in the current user schema by using the ore.datastore and ore.datastoreSummary functions.

Using the ore.datastore function, you can list basic information about datastores. The function returns a data.frame object with columns that correspond to the datastore name, the number of objects in the datastore, the datastore size, the creation date, and a description. Rows are sorted by column datastore.name in alphabetical order. You can search for a datastore by name or by using a regular expression pattern.

Example 2-19 demonstrates using the ore.datastore function. The example uses some of the R objects created in Example 2-18.

Example 2-19 Using the ore.datastore Function

# The datastore objects ds1 and ds2 and objects data.frame objects df1 and df2
# were created in Example 2-18.
ore.save(df1, df2, name = "dfobj", description = "df objects"
ore.save(x, y, z, name = "another_ds", description = "For pattern matching")

# List all of the datastore objects.
ore.datastore()

# List the specified datastore.
ore.datastore("ds1")

# List the datastore objects with names that include "ds".
ore.datastore(pattern = "ds")

Listing for Example 2-19

R> # The datastore objects ds1 and ds2 and objects data.frame objects df1 and df2
R> # were created in Example 2-18.
R> ore.save(df1, df2, name = "dfobj", description = "df objects"
R> ore.save(x, y, z, name = "another_ds", description = "For pattern matching")
R> 
R> # List all of the datastore objects.
R> ore.datastore()
  datastore.name object.count size       creation.date  description
1     another_ds            3 1243 2014-07-24 13:31:56 For pattern mattching
2          dfobj            2  656 2014-07-24 13:31:46            df objects
3            ds1            4 3162 2014-07-24 13:25:17          My datastore
4            ds2            2 1111 2014-07-24 13:27:26                only x

R> # List the specified datastore.
R> ore.datastore("ds1")
  datastore.name object.count size       creation.date  description
1            ds1            4 2908 2013-11-08 10:41:09 My datastore
R> 
R> # List the datastore objects with names that include "ds".
R> ore.datastore(pattern = "ds")
  datastore.name object.count size       creation.date            description
1     another_ds            3 1243 2014-07-24 13:31:56  For pattern mattching
2            ds1            4 3162 2014-07-24 13:25:17           My datastore
3            ds2            2 1111 2014-07-24 13:27:26                 only x

The ore.datastoreSummary function returns information about the R objects saved within a datastore in the user schema in the connected database. The function returns a data.frame with columns that correspond to object name, object class, object size, and either the length of the object, if it is a vector, or the number of rows and columns, if it is a data.frame object. It takes one argument, the name of a datastore.

Example 2-20 demonstrates using the ore.datastoreSummary function. The example uses the datastores created in Example 2-18.

Example 2-20 Using the ore.datastoreSummary Function

ore.datastoreSummary("ds1")
ore.datastoreSummary("ds2")

Listing for Example 2-20

R> ore.datastoreSummary("ds1")
    object.name      class size length row.count col.count
1    AIRQUALITY  ore.frame 1077      6       153         6
2           df1 data.frame  328      2         5         2
3           df2 data.frame  328      2         5         2
4       iris_of  ore.frame 1429      5       150         5
R> ore.datastoreSummary("ds2")
  object.name       class size length row.count col.count
1           x     numeric  182     20        NA        NA
2           z ore.numeric  929     20        NA        NA

Restoring Objects from a Datastore

The ore.load function restores R objects saved in a datastore to the R global environment, .GlobalEnv. The function returns a character vector that contains the names of the restored objects.

You can load all of the saved objects or you can use the list argument to specify the objects to load. With the envir argument, you can specify an environment in which to load objects.

Example 2-21 demonstrates using the ore.load function to restore objects from datastores created in Example 2-19.

Example 2-21 Using the ore.load Function to Restore Objects from a Datastore

# We are in the same R session as Example 2-19. List the R objects.
ls()

# List the datastores.
ore.datastore()

# Delete the x and z objects.
rm(x, z)
ls()

# Restore all of the objects in datastore ds2.
ore.load("ds2")

ls()

# After ending the R session and starting another session.
ls()
# The datastore objects persist between sessions.
ore.datastore()

# Restore some of the objects from datastore ds1.
ore.load("ds1", list = c("df1", "df2", "iris_of"))
ls()

Listing for Example 2-21

R> # We are in the same R session as Example 2-19. List the R objects.
R> ls()
[1] "df1"     "df2"     "iris_of" "x"       "y"       "z"
R> 
R> # List the datastores.
R> ore.datastore()
  datastore.name object.count size       creation.date  description
1     another_ds            3 1243 2014-07-24 13:31:56 For pattern mattching
2          dfobj            2  656 2014-07-24 13:31:46            df objects
3            ds1            4 3162 2014-07-24 13:25:17          My datastore
4            ds2            2 1111 2014-07-24 13:27:26                only x
R> 
R> # Delete the x and z objects.
R> rm(x, z)
R> ls()
[1] "df1"     "df2"     "iris_of" "y"
R> 
R> # Restore all of the objects in datastore ds2.
R> ore.load("ds2")
[1] "x" "z"
R> 
R> ls()
[1] "df1"     "df2"     "iris_of" "x"       "y"       "z"
R> 
R> # After ending the R session and starting another session.
R> ls()
character(0)
R> # The datastore objects persist between sessions.
R> ore.datastore()
  datastore.name object.count size       creation.date           description
1     another_ds            3 1243 2014-07-24 13:31:56 For pattern mattching
2          dfobj            2  656 2014-07-24 13:31:46            df objects
3            ds1            4 3162 2014-07-24 13:25:17          My datastore
4            ds2            2 1111 2014-07-24 13:27:26                only x

R> # Restore some of the objects from datastore ds1.
R> ore.load("ds1", list = c("df1", "df2", "iris_of"))
[1] "df1"     "df2"     "iris_of"
R> ls()
[1] "df1"     "df2"     "iris_of"

Deleting a Datastore

With the ore.delete function, you can delete objects from an Oracle R Enterprise datastore or you can delete the datastore itself. To delete a datastore, you specify the name of it. To delete one or more objects from the datastore, you specify the list argument. The ore.delete function returns the name of the deleted objects or datastore.

Example 2-22 demonstrates using ore.delete to delete an object from a datastore and then to delete the entire datastore. The example uses objects created in Example 2-18.

Example 2-22 Using the ore.delete Function

# Delete the df2 object from the ds1 .
ore.delete("ds1", list = "df2")
# Delete the datastore named ds1.
ore.delete("ds1")

Listing for Example 2-22

R> # Delete the the df2 object from the ds1 datastore.
R> ore.delete("ds1", list = "df2")[1] "df2"
R> # Delete the datastore named ds1.
R> ore.delete("ds1")
[1] "ds1"

When you delete a datastore, Oracle R Enterprise discards all temporary database objects that were referenced by R objects in the deleted datastore. If you have saved an R object in more than one datastore, then Oracle R Enterprise discards a temporary database object only when no object in a datastore references the temporary database object.

About Using a datastore in Embedded R Execution

Saving objects in a datastore makes it very easy to pass arguments to, and reference R objects with, embedded R execution functions. You can save objects that you create in one R session in a single datastore in the database. You can pass the name of this datastore to an embedded R function as an argument for loading within that function. You can use a datastore to easily pass one object or multiple objects.

See Also:

Chapter 6, "Using Oracle R Enterprise Embedded R Execution" for information on using the R and the SQL interfaces to embedded R execution