This chapter describes how to start using Oracle R Enterprise by connecting to an Oracle Database instance and creating Oracle R Enterprise objects and storing them in the database.
This chapter discusses these topics:
To use Oracle R Enterprise, you first connect to an Oracle Database instance as described in the following topics:
Oracle R Enterprise client components connect an R session to an Oracle Database instance and the Oracle R Enterprise server components. The connection makes the data in a database schema available to the R user. It also makes the processing power, memory, and storage capacities of the database server available to the R session through the Oracle R Enterprise client interface.
This section has the following topics:
To begin using Oracle R Enterprise, you first connect to a schema in an Oracle Database instance with the ore.connect
function. Only one Oracle R Enterprise connection can exist at a time during an R session. If an R session is already connected to the database, then invoking ore.connect
terminates the active connection before opening a new connection. Before attempting to connect, you can discover whether an active connection exists by using the ore.is.connected
function.
You explicitly end a connection with the ore.disconnect
function. If you do not invoke ore.disconnect
, then the connection is automatically terminated when the R session ends. For more information on ore.disconnect
, see "About Using the ore.disconnect Function."
With the type
argument of ore.connect
, you specify the type of connection, either ORACLE or HIVE. A HIVE type of connection connects to Hive tables in a Hadoop cluster. An ORACLE type of connection connects to a schema in an Oracle Database instance. The default value of type
is "ORACLE"
.
If the connection type is HIVE, then ore.connect
ignores all other arguments. For information on Oracle R Connector for Hadoop and Hive, see Oracle Big Data Connectors User's Guide. The HIVE option applies only if you are using Oracle R Advanced Analytics for Hadoop (ORAAH) in conjunction with a Hadoop cluster. ORAAH is part of the Oracle Big Data Connectors option to the Big Data Appliance.
If the connection type is ORACLE, then you do the following:
Use the logical all
argument to specify whether Oracle R Enterprise automatically creates an ore.frame
object for each table to which the user has access in the schema and makes those ore.frame
objects visible in the current R session. The ore.frame
objects contain metadata about the tables. The default value of the all
argument is FALSE
.
If all = TRUE
, then Oracle R Enterprise implicitly invokes the ore.sync
and ore.attach
functions. If all = FALSE
, then the user must explicitly invoke ore.sync
to create ore.frame
objects. To access these objects by name, the user must invoke ore.attach
to include the names in the search path. For information on those functions, see "Creating R Objects for In-Database Data".
Use either the conn_string
argument, or various combinations of the user
, sid
, host
, password
, port
, service_name
, and conn_string
arguments to specify information that identifies the connection.
To avoid using a clear-text password, you can specify an Oracle wallet password with the conn_string
argument. No other arguments are needed. By specifying an Oracle wallet password, you can avoid embedding a database user password in application code, batch jobs, or scripts. For information on creating an Oracle wallet, see Oracle R Enterprise Installation and Administration Guide.
With the other connection identifier arguments, you specify a database user name, host name, and password, and either a system identifier (SID) or service name, and, optionally, a TCP port, or you specify a database user name, password, and a conn_string
argument.
The default value of the port
argument is 1521, the default value of host
is "localhost"
, which specifies the local host, and the default value of conn_string
is NULL. You specify the local host when your R session is running on the same computer as the Oracle Database instance to which you want to connect.
See Also:
"Using the ore.connect and ore.disconnect Functions" for examples of using the various connection identifiers
To explicitly end the connection between an R session and the Oracle Database instance, invoke the ore.disconnect
function. Oracle R Enterprise implicitly invokes ore.disconnect
if you do either of the following:
Quit the R session.
Invoke ore.connect
while an Oracle R Enterprise connection is already active.
When you disconnect the active connection, Oracle R Enterprise discards all Oracle R Enterprise objects that you have not explicitly saved in an Oracle R Enterprise datastore. For information on saving objects, see "Saving and Managing R Objects in the Database".
The examples in this section demonstrate the various ways of specifying an Oracle R Enterprise connection to an Oracle Database instance. The examples use sample values for the ore.connect
argument values. Replace the sample values with the appropriate values for connecting to your database.
Example 2-1 invokes the ore.connect
function and specifies the user
, sid
, host
, password
, and port
arguments.
Example 2-1 Using ore.connect and Specifying a SID
ore.connect(user = "rquser", sid = "sales", host = "sales-server", password = "rquserStrongPassword", port = 1521 )
Example 2-2 demonstrates using a service name rather than a SID. It also specifies connecting to the local host.
Example 2-2 Using ore.connect and Specifying a Service Name
ore.connect(user = "rquser", host = "localhost", password = "rquserStrongPassword", service_name = "sales.example.com")
Example 2-3 uses the conn_string
argument to specify an easy connect string that identifies the connection.
Example 2-3 Using ore.connect and Specifying an Easy Connect String
ore.connect(user = "rquser", password = "rquserStrongPassword", conn_string = "sales-server:1521:sales (ADDRESS=(PROTOCOL=tcp) (HOST=sales-server) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=sales.example.com)))")
Example 2-4 uses the conn_string
argument to specify a full connection string that identifies the connection.
Example 2-4 Using ore.connect and Specifying a Full Connection String
ore.connect(user = "rquser", password = "rquserStrongPassword", conn_string = "DESCRIPTION= (ADDRESS=(PROTOCOL=tcp) (HOST=sales-server) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=myserver.example.com)))")
Example 2-5 uses the conn_string
argument to specify an Oracle wallet. The mydb_test
string is the connection identifier for the Oracle database. The Oracle wallet contains the information needed to create the connection. For information on creating an Oracle wallet for an Oracle R Enterprise connection, see Oracle R Enterprise Installation and Administration Guide.
Example 2-5 Using the conn_string Argument to Specify an Oracle Wallet
ore.connect(conn_string = "mydb_test")
Example 2-6 uses an empty connection string to connect to the local host.
Example 2-6 Using the conn_string Argument and Specifying an Empty Connection String
ore.connect(user = "rquser", password = "rquserStrongPassword", conn_string = "")
Example 2-7 connects to a pluggable database using the conn_string
argument to specify a service name.
Example 2-7 Using the conn_string Argument in Connecting to a Pluggable Database
ore.connect(conn_string = "pdb1.example.com")
Example 2-8 invokes ore.connect
using a service name, host name, and port number to connect to a pluggable database.
Example 2-8 Using the service_name Argument in Connecting to a Pluggable Database
ore.connect(service_name = "pdb1.example.com", host = "mypdb", port = 1521)
Example 2-9 explicitly disconnects an Oracle R Enterprise session from an Oracle database.
With transparency layer functions you can connect to an Oracle Database instance and interact with data structures in a database schema. You can move data to and from the database and create database tables. You can also save R objects in the database. The Oracle R Enterprise functions that perform these actions are described in the following topics.
Using Oracle R Enterprise, you can create R proxy objects in your R session from database-resident data as described in the following topics.
When you invoke ore.connect
in an R session, Oracle R Enterprise creates a connection to a schema in an Oracle Database instance. To gain access to the data in the database tables in the schema, you use the ore.sync
function. That function creates an ore.frame
object that is a proxy for a table in a schema. You can use the ore.attach
function to add an R environment that represents a schema to the R search path. For information on connecting to the database, see "Connecting to an Oracle Database Instance".
When you use the ore.sync
function to create an ore.frame
object as a proxy for a database table, the name of the ore.frame
proxy object is the same as the name of the database object. Each ore.frame
proxy object contains metadata about the corresponding database object.
You can use the proxy ore.frame
object to select data from the table. When you execute an R operation that selects data from the table, the operation returns the current data from the database object. However, if some application has added a column to the table, or has otherwise changed the metadata of the database object, the ore.frame
proxy object does not reflect such a change until you again invoke ore.sync
for the database object.
If you invoke the ore.sync
function with no tables specified, and if the value of the all
argument was FALSE
in the ore.connect
function call that established the connection to the Oracle database instance, then the ore.sync
function creates a proxy object for each table in the schema specified by ore.connect
. You can use the table
argument to specify the tables for which you want to create ore.frame
proxy objects.
Tip:
To conserve memory resources and save time, you should only add proxies for the tables that you want to use in your R session.With the schema
argument, you can specify the schema for which you want to create an R environment and proxy objects. Only one environment for a given database schema can exist at a time. With the use.keys
argument, you can specify whether you want to use primary keys in the table to order the ore.frame
object.
Tip:
Ordering is expensive in the database. Because most operations in R do not need ordering, you should generally setuse.keys
to FALSE
unless you need ordering for sampling data or some other purpose. For more information on ordering, see "Creating Ordered and Unordered ore.frame Objects".With the query
argument, you can specify a SQL SELECT
statement. This enables you to create an ore.frame
for a query without creating a view in the database. This can be useful when you not have the CREATE VIEW
system privilege for the current schema. You cannot use the schema
argument and the query
argument in the same ore.sync
invocation.
You can use the ore.ls
function to list the ore.frame
proxy objects that correspond to database tables in the environment for a schema. You can use the ore.exists
function to find out if an ore.frame
proxy object for a database table exists in an R environment. The function returns TRUE
if the proxy object exists or FALSE
if it does not. You can remove an ore.frame
proxy object from an R environment with the ore.rm
function.
Example 2-10 demonstrates the use of the ore.sync
function. The example first invokes the ore.exec
function to create some tables to represent tables existing in the database schema for the rquser. The example then invokes ore.sync
and specifies three tables of the schema. The ore.sync
invocation creates an R environment for the rquser schema and creates proxy ore.frame
objects for the specified tables in that schema. The example lists the ore.frame
proxy objects in the current environment. The TABLE3 table exists in the schema but does not have an ore.frame
proxy object because it was not included in the ore.sync
invocation.
The example next invokes ore.sync
with the query
argument to create ore.frame
objects for the specified SQL queries. The example lists the ore.frame
objects again.
The example then invokes ore.sync
again and creates an R environment for the SH schema and proxy objects in that environment for the specified tables in that schema. The example invokes the ore.exists
function to find out if the specified table exists in the current environment and then in the SH environment. The example lists the R objects in the SH environment.
The example next removes the ore.frame
objects QUERY1
, QUERY2
, and TABLE4
from the rquser environment. Finally, the example lists the proxy objects in the environment again.
Note:
Theore.rm
function invocation removes the ore.frame
that is a proxy for the TABLE4 table from the environment. It does not delete the table from the schema.Example 2-10 Using ore.sync to Add ore.frame Proxy Objects to an R Environment
# After connecting to a database as rquser, create some tables. ore.exec("CREATE TABLE TABLE1 AS SELECT * FROM dual") ore.exec("CREATE TABLE TABLE2 AS SELECT * FROM dual") ore.exec("CREATE TABLE TABLE3 AS SELECT * FROM dual") ore.exec("CREATE TABLE TABLE4 AS SELECT * FROM dual") # Create ore.frame objects for the specified tables. ore.sync(table = c("TABLE1", "TABLE3", "TABLE4")) # List the ore.frame proxy objects in the current environment. ore.ls() # Create ore.frame objects for the specified queries. ore.sync(query = c("QUERY1" = "SELECT 0 X, 1 Y FROM dual", "QUERY2" = "SELECT 1 X, 0 Y FROM dual")) ore.ls() # The rquser user has been granted SELECT permission on the tables in the # SH schema. ore.sync("SH", table = c("CUSTOMERS", "SALES")) # Find out if the CUSTOMERS ore.frame exists in the rquser environment. ore.exists("CUSTOMERS") # Find out if it exists in the SH environment. ore.exists("CUSTOMERS", schema = "SH") # List the ore.frame proxy objects in the SH environment. ore.ls("SH") # Remove the ore.frame objects for the specified objects. ore.rm(c("QUERY1", "QUERY2", "TABLE4")) # List the ore.frame proxy objects in the current environment again. ore.ls()
Listing for Example 2-10
R> # After connecting to a database as rquser, create some tables. R> ore.exec("CREATE TABLE TABLE1 AS SELECT * FROM dual") R> ore.exec("CREATE TABLE TABLE2 AS SELECT * FROM dual") R> ore.exec("CREATE TABLE TABLE3 AS SELECT * FROM dual") R> ore.exec("CREATE TABLE TABLE4 AS SELECT * FROM dual") R> # Create ore.frame objects for the specified tables. R> ore.sync(table = c("TABLE1", "TABLE3", "TABLE4")) R> # List the ore.frame proxy objects in the current environment. R> ore.ls() [1] "TABLE1" "TABLE3" "TABLE4" R> # Create ore.frame objects for the specified queries. R> ore.sync(query = c("QUERY1" = "SELECT 0 X, 1 Y FROM dual", + "QUERY2" = "SELECT 1 X, 0 Y FROM dual")) R> ore.ls() [1] "QUERY1" "QUERY2" "TABLE1" "TABLE3" "TABLE4" R> # The rquser user has been granted SELECT permission on the tables in the R> # SH schema. R> ore.sync("SH", table = c("CUSTOMERS", "SALES")) R> # Find out if the CUSTOMERS ore.frame exists in the rquser environment. R> ore.exists("CUSTOMERS") [1] FALSE R> # Find out if it exists in the SH environment. R> ore.exists("CUSTOMERS", schema = "SH") [1] TRUE R> # List the ore.frame proxy objects in the SH environment. R> ore.ls("SH") [1] "CUSTOMERS" "SALES" R> # Remove the ore.frame objects for the specified objects. R> ore.rm(c("QUERY1", "QUERY2", "TABLE4")) R> # List the ore.frame proxy objects in the current environment again. R> ore.ls() [1] "TABLE1" TABLE3"
After you have created an R environment and ore.frame
proxy objects with ore.sync
, you can get a proxy object by name with the ore.get
function, as shown in Example 2-11. The example invokes the ore.sync
function to create an ore.frame
object that is a proxy for the CUSTOMERS table in the SH schema. The example then gets the dimensions of the proxy object.
Example 2-11 Using ore.get to Get a Database Table
ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE) dim(ore.get(name = "CUSTOMERS", schema = "SH"))
Listing for Example 2-11
R> ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE) R> dim(ore.get(name = "CUSTOMERS", schema = "SH")) [1] 630 15
You can use ore.get
to get the proxy ore.frame
for a table and assign it to a variable in R, as in SH_CUST <- ore.get(name = "CUSTOMERS", schema = "SH")
. The ore.frame
exists in the R global environment, which can be referred to using .GlobalEnv
, and so it appears in the list returned by the ls
function. Also, because this object exists in the R global environment, as opposed an R environment that represents a database schema, it is not listed by the ore.ls
function.
With ore.attach
, you add an R environment for a database schema to the R search path. When you add the R environment, you have access to database tables by name through the proxy objects created by the ore.sync
function without needing to specify the schema environment.
The default schema is the one specified in creating the connection and the default position in the search path is 2. You can specify the schema and the position in the ore.attach function invocation.. You can also specify whether you want the ore.attach
function to indicate whether a naming conflict occurs when adding the environment. You can detach the environment for a schema from the R search path with the ore.detach
function.
Example 2-12 demonstrates the use of the ore.attach
function. Comments in the example explain the function invocations.
Example 2-12 Using ore.attach to Add an Environment for a Database Schema
# Connected as rquser. # Add the environment for the rquser schema to the R search path. ore.attach() # Create an unordered ore.frame proxy object in the SH environment for the # specifed table. ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE) # Add the environment for the SH schema to the search path and warn if naming # conflicts exist. ore.attach("SH", 3, warn.conflicts = TRUE) # Display the number of rows and columns in the proxy object for the table. dim(CUSTOMERS) # Remove the environment for the SH schema from the search path. ore.detach("SH") # Invoke the dim function again. dim(CUSTOMERS)
Listing for Example 2-12
R> # Connected as rquser. R> # Add the environment for the rquser schema to the R search path. R> ore.attach() R> # Create an unordered ore.frame proxy object in the SH environment for the R> # specifed table. R> ore.sync(schema = "SH", table = "CUSTOMERS", use.keys = FALSE) R> # Add the environment for the SH schema to the search path and warn if naming R> # conflicts exist. R> ore.attach("SH", 3, warn.conflicts = TRUE) R> # Display the number of rows and columns in the proxy object for the table. R> dim(CUSTOMERS) [1] 630 15 R> # Remove the environment for the SH schema from the search path. R> ore.detach("SH") R> # Invoke the dim function again. R> dim(CUSTOMERS) Error: object 'CUSTOMERS' not found
Oracle R Enterprise provides the ability to create ordered or unordered ore.frame
objects. The following topics describe this feature.
R objects such as vector
and data.frame
have an implicit ordering of their elements. The data in an Oracle Database table is not necessarily ordered. For some R operations, ordering is useful whereas for other operations it is unnecessary. By ordering an ore.frame
, you are able to index the ore.frame
object by using either integer or character indexes.
Using an ordered ore.frame
object that is a proxy for a SQL query can be time-consuming for a large data set. Therefore, although Oracle R Enterprise attempts to create ordered ore.frame
objects by default, it also provides the means of creating an unordered ore.frame
object.
When you invoke the ore.sync
function to create an Oracle R Enterprise ore.frame
object as a proxy for a SQL query, you can use the use.keys
argument to specify whether the ore.frame
can be ordered or must be unordered.
An ore.frame
object can be ordered if one or more of the following conditions are true:
The value of the use.keys
argument of the ore.sync
function is TRUE
and a primary key is defined on the underlying table
The row names of the ore.frame
constitute a unique tuple
The ore.frame
object is produced by certain functions such as aggregate
and cbind
All of the ore.frame
objects that are input arguments to relevant Oracle R Enterprise functions are ordered
An ore.frame
object is unordered if one or more of the following conditions are true:
The value of the use.keys
argument of the ore.sync
function is FALSE
No primary key is defined on the underlying table and either the row names of the ore.frame
object are not specified or the row names of the ore.frame
object are set to NULL
One or more of the ore.frame
objects that are input arguments to relevant Oracle R Enterprise functions are unordered
An unordered ore.frame
object has null row names. You can determine whether an ore.frame
object is ordered by invoking is.null
on the row names of the objects, as shown in the last lines of Example 2-13. If the ore.frame
object is unordered, is.null
returns an error.
See Also:
"Indexing Data"Oracle R Enterprise has options that relate to the ordering of an ore.frame
object. The ore.warn.order
global option specifies whether you want Oracle R Enterprise to display a warning message if you use an unordered ore.frame
object in a function that requires ordering. If you know what to expect in an operation, then you might want to turn the warnings off so they do not appear in the output. For examples of the warning messages, see Example 2-13 and Example 2-14.
You can see what the current setting is, or turn the option on or off, as in the following example.
R> options("ore.warn.order") $ore.warn.order [1] TRUE R> options("ore.warn.order" = FALSE) R> options("ore.warn.order" = TRUE)
With the ore.sep
option, you can specify the separator between the row name values that you use for multi-column keys, as in the following example.
R> options("ore.sep") $ore.sep [1] "|" R> options("ore.sep" = "/") R> options("ore.sep" = "|")
You can use the primary key of a database table to order an ore.frame
object, as demonstrated in Example 2-13. The example loads the spam data set from the kernlab
package. It adds two columns to the data set.
The example invokes ore.drop
to drop the named tables if they exist. It then invokes ore.create
to create two tables from the data set. It invokes ore.exec
to make the USERID and TS columns a composite primary key of the SPAM_PK table, and invokes ore.sync
to synchronize the table with its ore.frame
proxy.
Note:
Theore.exec
function executes a SQL statement in the Oracle Database schema. The function is intended for database definition language (DDL) statements that have no return value.Example 2-13 then displays the first eight rows of each table.The proxy object for the SPAM_PK table is an ordered ore.frame
object. It has row names that are a combination of the TS and USERID column values separated by the "|" character. The proxy object for the SPAM_NOPK table is an unordered ore.frame
object that has the symbol SPAM_NOPK
. By default, SPAM_NOPK
has row names that are sequential numbers.
The example invokes the ore.exec
function to execute a SQL statement that makes the RID column the primary key of the database table.
Example 2-13 Ordering Using Keys
# Prepare the data. library(kernlab) data(spam) s <- spam # Create a column that has integer values. s$TS <- 1001:(1000 + nrow(s)) # Create a column that has integer values with each number repeated twice. s$USERID <- rep(351:400, each=2, len=nrow(s)) # Ensure that the database tables do not exist. ore.drop(table='SPAM_PK') ore.drop(table='SPAM_NOPK') # Create database tables. ore.create(s[,c(59:60,1:28)], table="SPAM_PK") ore.create(s[,c(59:60,1:28)], table="SPAM_NOPK") # Using a SQL statement, alter the SPAM_PK table to add a composite primary key. ore.exec("alter table SPAM_PK add constraint SPAM_PK primary key (\"USERID\",\"TS\")") # Synchronize the table to get the change to it. ore.sync(table = "SPAM_PK") # View the data in the tables. # The row names of the ordered SPAM_PK are the primary key column values. head(SPAM_PK[,1:8]) # The row names of the unordered SPAM_NOPK are sequential numbers. # The first warning results from the inner accessing of SPAM_NOPK to subset # the columns. The second warning is for the invocation of the head # function on that subset. head(SPAM_NOPK[,1:8]) # Verify that SPAM_NOPK is unordered. is.null(row.names(SPAM_NOPK))
Listing for Example 2-13
R> # Prepare the data. R> library(kernlab) R> data(spam) R> s <- spam R> # Create a column that has integer values. R> s$TS <- 1001:(1000 + nrow(s)) R> # Create a column that has integer values with each number repeated twice. R> s$USERID <- rep(351:400, each=2, len=nrow(s)) R> # Ensure that the database tables do not exist. R> ore.drop(table='SPAM_PK') R> ore.drop(table='SPAM_NOPK') R> # Create database tables. R> ore.create(s[,c(59:60,1:28)], table="SPAM_PK") R> ore.create(s[,c(59:60,1:28)], table="SPAM_NOPK") R> # Using a SQL statement, alter the SPAM_PK table to add a composite primary key. R> ore.exec("alter table SPAM_PK add constraint SPAM_PK primary key + (\"USERID\",\"TS\")") R> # Synchronize the table to get the change to it. R> ore.sync(table = "SPAM_PK") R> # View the data in the tables. R> # The row names of the ordered SPAM_PK are the primary key column values. R> head(SPAM_PK[,1:8]) TS USERID make address all num3d our over 1001|351 1001 351 0.00 0.64 0.64 0 0.32 0.00 1002|351 1002 351 0.21 0.28 0.50 0 0.14 0.28 1003|352 1003 352 0.06 0.00 0.71 0 1.23 0.19 1004|352 1004 352 0.00 0.00 0.00 0 0.63 0.00 1005|353 1005 353 0.00 0.00 0.00 0 0.63 0.00 1006|353 1006 353 0.00 0.00 0.00 0 1.85 0.00 R> # The row names of the unordered SPAM_NOPK are sequential numbers. R> # The first warning results from the inner accessing of SPAM_NOPK to subset R> # the columns. The second warning is for the invocation of the head R> # function on that subset. R> head(SPAM_NOPK[,1:8]) TS USERID make address all num3d our over 1 1001 351 0.00 0.64 0.64 0 0.32 0.00 2 1002 351 0.21 0.28 0.50 0 0.14 0.28 3 1003 352 0.06 0.00 0.71 0 1.23 0.19 4 1004 352 0.00 0.00 0.00 0 0.63 0.00 5 1005 353 0.00 0.00 0.00 0 0.63 0.00 6 1006 353 0.00 0.00 0.00 0 1.85 0.00 Warning messages: 1: ORE object has no unique key - using random order 2: ORE object has no unique key - using random order R> # Verify that SPAM_NOPK is unordered. R> is.null(row.names(SPAM_NOPK)) Error: ORE object has no unique key
You can use row names to order an ore.frame
object, as demonstrated in Example 2-14. The example creates a data.frame
object in the local R session memory and pushes it to the ore.frame
object with the symbol a
, which exists in the memory of the Oracle database to which the R session is connected. The example shows that the ore.frame
object has the default row names of the R data.frame
object. Because the ore.frame
object is ordered, invoking the row.names
function on it does not produce a warning message.
The example uses the ordered SPAM_PK
and unordered SPAM_NOPK
ore.frame
objects from Example 2-13 to show that invoking row.names
on the unordered SPAM_NOPK
produces a warning message but invoking it on the ordered SPAM_PK
does not.
The SPAM_PK
object is ordered by the row names, which are the combined values of the TS and USERID column values separated by the "|" character. The example shows that you can change the row names.
Example 2-14 Ordering Using Row Names
# Create an ordered ore.frame by default. a <- ore.push(data.frame(a=c(1:10,10:1), b=letters[c(1:10,10:1)])) # Display the values in the b column. Note that because the ore.frame is # ordered, no warnings appear. a$b # Display the default row names for the first six rows of the a column. row.names(head(a)) # SPAM_NOPK has no unique key, so row.names raises error messages. row.names(head(SPAM_NOPK)) # Row names consist of TS ’|' USERID. # For display on this page, only the first four row names are shown. row.names(head(SPAM_PK)) # Reassign the row names to the TS column only row.names(SPAM_PK) <- SPAM_PK$TS # The row names now correspond to the TS values only. row.names(head(SPAM_PK[,1:4])) head(SPAM_PK[,1:4])
Listing for Example 2-14
R> # Create an ordered ore.frame by default. R> a <- ore.push(data.frame(a=c(1:10,10:1), b=letters[c(1:10,10:1)])) R> # Display the values in the b column. Note that because the ore.frame is R> # ordered, no warnings appear. R> a$b [1] a b c d e f g h i j j i h g f e d c b aLevels: a b c d e f g h i j R> # Display the default row names for the first six rows of the a column. R> row.names(head(a)) [1] 1 2 3 4 5 6 R> # SPAM_NOPK has no unique key, so row.names raises error messages. R> row.names(head(SPAM_NOPK)) Error: ORE object has no unique key In addition: Warning message: ORE object has no unique key - using random order R> # Row names consist of TS ’|' USERID. R> # For display on this page, only the first four row names are shown. R> row.names(head(SPAM_PK)) 1001|351 1002|351 1003|352 1004|352 "1001|3.51E+002" "1002|3.51E+002" "1003|3.52E+002" "1004|3.52E+002" R> # Reassign the row names to the TS column only R> row.names(SPAM_PK) <- SPAM_PK$TS R> # The row names now correspond to the TS values only. R> row.names(head(SPAM_PK[,1:4])) [1] 1001 1002 1003 1004 1005 1006 R> head(SPAM_PK[,1:4]) TS USERID make address 1001 1001 351 0.00 0.64 1002 1002 351 0.21 0.28 1003 1003 352 0.06 0.00 1004 1004 352 0.00 0.00 1005 1005 353 0.00 0.00 1006 1006 353 0.00 0.00
Example 2-15 example uses the ordered SPAM_PK
and unordered SPAM_NOPK
ore.frame
objects from Example 2-13 to show the result of merging two ordered ore.frame
objects and two unordered ore.frame
objects.
Example 2-15 Merging Ordered and Unordered ore.frame Objects
# Create objects for merging data from unordered ore.frame objects. x <- SPAM_NOPK[,1:4] y <- SPAM_NOPK[,c(1,2,4,5)] m1 <- merge(x, y, by="USERID") # The merged result m1 produces a warning because it is not an ordered frame. head(m1,3) # Create objects for merging data from ordered ore.frame objects. x <- SPAM_PK[,1:4] y <- SPAM_PK[,c(1,2,4,5)] # The merged result m1 does not produce a warning now because it is an # ordered frame. m1 <- merge(x, y, by="USERID") head(m1,3)
Listing for Example 2-15
R> # Create objects for merging data from unordered ore.frame objects. R> x <- SPAM_NOPK[,1:4] R> y <- SPAM_NOPK[,c(1,2,4,5)] R> m1 <- merge(x, y, by="USERID") R> # The merged result m1 produces a warning because it is not an ordered frame. R> head(m1,3) USERID TS.x make address.x TS.y address.y all 1 351 5601 0.00 0 1001 0.64 0.64 2 351 5502 0.00 0 1001 0.64 0.64 3 351 5501 0.78 0 1001 0.64 0.64 Warning messages: 1: ORE object has no unique key - using random order 2: ORE object has no unique key - using random order R> # Create objects for merging data from ordered ore.frame objects. R> x <- SPAM_PK[,1:4] R> y <- SPAM_PK[,c(1,2,4,5)] R> # The merged result m1 does not produce a warning now because it is an R> # ordered frame. R> m1 <- merge(x, y, by="USERID") R> head(m1,3) USERID TS.x make address.x TS.y address.y all 1001|1001 351 1001 0 0.64 1001 0.64 0.64 1001|1002 351 1001 0 0.64 1002 0.28 0.50 1001|1101 351 1001 0 0.64 1101 0.00 0.00
You can create a temporary database table, and corresponding proxy ore.frame
object, from a local R object with the ore.push
function. You can create a local R object that contains a copy of data represented by an Oracle R Enterprise proxy object with the ore.pull
function.
The ore.push
function translates an R object into an Oracle R Enterprise object of the appropriate data type. The ore.pull
function takes an ore
class object and returns an R object. If the input object is an ore.list
, the ore.pull
function creates a data.frame
and translates each the data of each database column into the appropriate R representation.
Note:
You can pull data to a local Rdata.frame
only if the data can fit into the R session memory. Also, even if the data fits in memory but is still very large, you may not be able to perform many, or any, R functions in the client R session.Example 2-16 demonstrates pushing an R data.frame
object to the database as a temporary database table with an associated ore.frame
object, iris_of
, then creating another ore.frame
object, iris_of_setosa
, by selecting one column from iris_of
, and then pulling the iris_of_setosa
object into the local R session memory as a data.frame
object. The example displays the class of some of the objects.
Example 2-16 Using ore.push and ore.pull to Move Data
class(iris) # Push the iris data frame to the database. iris_of <- ore.push(iris) class(iris_of) # Display the data type of the Sepal.Length column in the data.frame. class(iris$Sepal.Length) # Display the data type of the Sepal.Length column in the ore.frame. class(iris_of$Sepal.Length) # Filter one column of the data set. iris_of_setosa <- iris_of[iris_of$Species == "setosa", ] class(iris_of_setosa) # Pull the selected column into the local R client memory. local_setosa = ore.pull(iris_of_setosa) class(local_setosa)
Listing for Example 2-16
R> class(iris) [1] "data.frame" R> # Push the iris data frame to the database. R> iris_of <- ore.push(iris) R> class(iris_of) [1] "ore.frame" attr(,"package") [1] "OREbase" R> # Display the data type of the Sepal.Length column in the data.frame. R> class(iris$Sepal.Length) [1] "numeric" R> # Display the data type of the Sepal.Length column in the ore.frame. R> class(iris_of$Sepal.Length) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> # Filter one column of the data set. R> iris_of_setosa <- iris_of[iris_of$Species == "setosa", ] R> class(iris_of_setosa) [1] "ore.frame" attr(,"package") [1] "OREbase" R> # Pull the selected column into the local R client memory. R> local_setosa = ore.pull(iris_of_setosa) R> class(local_setosa) [1] "data.frame"
Unless you explicitly save them, the temporary database tables and their corresponding Oracle R Enterprise proxy objects that you create with the ore.push
function are discarded when you quit the R session.
See Also:
"Transparency Layer Support for R Data Types and Classes" for information on data type mappings
"Saving and Managing R Objects in the Database" for information on permanently saving the Oracle R Enterprise objects in the database
The push_pull.R
example script
You can use the ore.create
function to create a persistent table in an Oracle Database schema. Creating the table automatically creates an ore.frame
proxy object for the table in the R environment that represents your database schema. The proxy ore.frame
object has the same name as the table. You can delete the persistent table in an Oracle Database schema with the ore.drop
function.
Caution:
Only use theore.drop
function to delete a database table and its associated ore.frame
proxy object. Never use it to remove an ore.frame
object that is not associated with a permanent database table. To remove an ore.frame
object for a temporary database table, use the ore.rm
function.Example 2-17 creates tables in the database and drops some of them.
Example 2-17 Using ore.create and ore.drop to Create and Drop Tables
# Create the AIRQUALITY table from the data.frame for the airquality data set. ore.create(airquality, table = "AIRQUALITY") # Create data.frame objects. df1 <- data.frame(x1 = 1:5, y1 = letters[1:5]) df2 <- data.frame(x2 = 5:1, y2 = letters[11:15]) # Create the DF1 and DF2 tables from the data.frame objects. ore.create(df1, "DF1") ore.create(df2, "DF2") # Create the CARS93 table from the data.frame for the Cars93 data set. ore.create(Cars93, table = "CARS93") # List the Oracle R Enterprise proxy objects. ore.ls() # Drop the CARS93 object. ore.drop(table = "CARS93") # List the Oracle R Enterprise proxy objects again. ore.ls()
Listing for Example 2-17
R> # Create the AIRQUALITY table from the data.frame for the airquality data set. R> ore.create(airquality, table = "AIRQUALITY") R> # Create data.frame objects. R> df1 <- data.frame(x1 = 1:5, y1 = letters[1:5]) R> df2 <- data.frame(x2 = 5:1, y2 = letters[11:15]) R> # Create the DF1_TABLE and DF2_TABLE tables from the data.frame objects. R> ore.create(df1, "DF1") R> ore.create(df2, "DF2") R> # Create the CARS93 table from the data.frame for the Cars93 data set. R> ore.create(Cars93, table = "CARS93") R> # List the Oracle R Enterprise proxy objects. R> ore.ls() [1] "AIRQUALITY" "CARS93" "DF1" "DF2_" R> # Drop the CARS93 object. R> ore.drop(table = "CARS93") R> # List the Oracle R Enterprise proxy objects again. R> ore.ls() [1] "AIRQUALITY" "DF1_" "DF2"
Oracle R Enterprise provides datastores that you can use to save Oracle R Enterprise proxy objects, as well as any R object, in an Oracle database. You can restore the saved objects in another R session. The objects in a datastore are also accessible to embedded R execution through both the R and the SQL interfaces.
This section describes the Oracle R Enterprise functions that you can use to create and manage datastores. The section contains the following topics:
R objects, including Oracle R Enterprise proxy objects, exist for the duration of the current R session unless you explicitly save them. The standard R functions for saving and restoring R objects, save
and load
, serialize objects in R memory to store them in a file and deserialize them to restore them in memory. However, for Oracle R Enterprise proxy objects, those functions do not save the database objects associated with the proxy objects in an Oracle database; therefore the saved proxy objects do not behave properly in a different R session.
You can save Oracle R Enterprise proxy objects, as well as any R object, with the ore.save
function. The ore.save
function specifies an Oracle R Enterprise datastore. A datastore persists in the database when you end the R session. The datastore maintains the referential integrity of the objects it contains. Using the ore.load
function, you can restore in another R session the objects in the datastore.
Using a datastore, you can do the following:
Save Oracle R Enterprise and other R objects that you create in one R session and restore them in another R session.
Pass arguments to R functions for use in embedded R execution.
Pass objects for use in embedded R execution. You could, for example, use a function in the OREdm
package to build an Oracle Data Mining model and save it in a datastore. You could then use that model to score data in the database through embedded R execution. For an example of using a datastore in an embedded R execution function, see Example 6-10.
Table 2-1 lists the functions that manipulate datastores and provides brief descriptions of them.
Table 2-1 Functions that Manipulate Datastores
Function | Description |
---|---|
Saves R objects in a new or existing datastore. |
|
Restores objects from a datastore into an R environment. |
|
Lazily restores objects from a datastore into an R environment. |
|
Deletes a datastore from the current Oracle database schema. |
|
Lists information about a datastore in the current Oracle database schema. |
|
Provides detailed information about the specified datastore in the current Oracle database schema. |
See Also:
Chapter 6, "Using Oracle R Enterprise Embedded R Execution" for information on using the R and the SQL interfaces to embedded R executionEach database schema has a table that stores named Oracle R Enterprise datastores. A datastore can contain Oracle R Enterprise objects and standard R objects.
You create a datastore with the ore.save
function. When you create a datastore, you specify a name for it. You can save objects in one or more datastores.
As long as a datastore contains an Oracle R Enterprise proxy object for a database object, the database object persists between R sessions. For example, you could use the ore.odmNB
function in the OREdm
package to build an Oracle Data Mining Naive Bayes model. If you save the resulting ore.odmNB
object in a datastore and end the R session, then Oracle Database does not delete the Oracle Data Mining model. If no datastore contains the ore.odmNB
object and the R session ends, then the database automatically drops the model.
The ore.save
function saves one or more R objects in the specified datastore. By default, Oracle R Enterprise creates the datastore in the current user schema. With the arguments to ore.save
, you can provide the names of specific objects, or provide a list of objects. You can specify a particular R environment to search for the objects you would like to save. The overwrite
and append
arguments are mutually exclusive. If you set the overwrite
argument to TRUE
, then you can replace an existing datastore with another datastore of the same name. If you set the append
argument to TRUE
, then you can add objects to an existing datastore. With the description
argument, you can provide some descriptive text that appears when you get information about the datastore. The description
argument has no effect when used with the append
argument.
Example 2-18 demonstrates creating datastores using different combinations of arguments.
Example 2-18 Saving Objects and Creating a Datastore
# Create some R objects. df1 <- data.frame(x1 = 1:5, y1 = letters[1:5]) df2 <- data.frame(x2 = 5:1, y2 = letters[11:15]) iris_of <- ore.push(iris) # Create a database table and an Oracle R Enterprise proxy object for the table. ore.drop("AIRQUALITY") ore.create(airquality, table = "AIRQUALITY") # List the R objects. ls() # List the Oracle R Enterprise proxy objects. ore.ls() # Save the proxy object and all objects in the current workspace environment # to the datastore named ds1 and supply a description. ore.save(AIRQUALITY, list = ls(), name = "ds1", description = "My datastore") # Create some more objects. x <- stats::runif(20) # x is an object of type numeric. y <- list(a = 1, b = TRUE, c = "hoopsa") z <- ore.push(x) # z is an object of type ore.numeric. # Create another datastore. ore.save(x, y, name = "ds2", description = "x and y") # Overwrite the contents of datastore ds2. ore.save(x, name = "ds2", overwrite = TRUE, description = "only x") # Append object z to datastore ds2. ore.save(z, name = "ds2", append = TRUE)
Listing for Example 2-18
R> # Create some R objects. R> df1 <- data.frame(x1 = 1:5, y1 = letters[1:5]) R> df2 <- data.frame(x2 = 5:1, y2 = letters[11:15]) R> iris_of <- ore.push(iris) R> R> # Create a database table and an Oracle R Enterprise proxy object for the table. R> ore.drop("AIRQUALITY") R> ore.create(airquality, table = "AIRQUALITY") R> R> # List the R objects. R> ls() [1] "df1" "df2" "iris_of" R> R> # List the Oracle R Enterprise proxy objects. R> ore.ls() [1] "AIRQUALITY" R> R> # Save the proxy object and all objects in the current workspace environment R> # to the datastore named ds1 and supply a description. R> ore.save(AIRQUALITY, list = ls(), name = "ds1", description = "My datastore") R> R> # Create some more objects. R> x <- stats::runif(20) # x is an object of type numeric. R> y <- list(a = 1, b = TRUE, c = "hoopsa") R> z <- ore.push(x) # z is an object of type ore.numeric. R> R> # Create another datastore. R> ore.save(x, y, name = "ds2", description = "x and y") R> R> # Overwrite the contents of datastore ds2. R> ore.save(x, name = "ds2", overwrite = TRUE, description = "only x") R> R> # Append object z to datastore ds2. R> ore.save(z, name = "ds2", append = TRUE)
See Also:
You can get information about a datastore in the current user schema by using the ore.datastore
and ore.datastoreSummary
functions.
Using the ore.datastore
function, you can list basic information about datastores. The function returns a data.frame
object with columns that correspond to the datastore name, the number of objects in the datastore, the datastore size, the creation date, and a description. Rows are sorted by column datastore.name
in alphabetical order. You can search for a datastore by name or by using a regular expression pattern.
Example 2-19 demonstrates using the ore.datastore
function. The example uses some of the R objects created in Example 2-18.
Example 2-19 Using the ore.datastore Function
# The datastore objects ds1 and ds2 and objects data.frame objects df1 and df2 # were created in Example 2-18. ore.save(df1, df2, name = "dfobj", description = "df objects" ore.save(x, y, z, name = "another_ds", description = "For pattern matching") # List all of the datastore objects. ore.datastore() # List the specified datastore. ore.datastore("ds1") # List the datastore objects with names that include "ds". ore.datastore(pattern = "ds")
Listing for Example 2-19
R> # The datastore objects ds1 and ds2 and objects data.frame objects df1 and df2 R> # were created in Example 2-18. R> ore.save(df1, df2, name = "dfobj", description = "df objects" R> ore.save(x, y, z, name = "another_ds", description = "For pattern matching") R> R> # List all of the datastore objects. R> ore.datastore() datastore.name object.count size creation.date description 1 another_ds 3 1243 2014-07-24 13:31:56 For pattern mattching 2 dfobj 2 656 2014-07-24 13:31:46 df objects 3 ds1 4 3162 2014-07-24 13:25:17 My datastore 4 ds2 2 1111 2014-07-24 13:27:26 only x R> # List the specified datastore. R> ore.datastore("ds1") datastore.name object.count size creation.date description 1 ds1 4 2908 2013-11-08 10:41:09 My datastore R> R> # List the datastore objects with names that include "ds". R> ore.datastore(pattern = "ds") datastore.name object.count size creation.date description 1 another_ds 3 1243 2014-07-24 13:31:56 For pattern mattching 2 ds1 4 3162 2014-07-24 13:25:17 My datastore 3 ds2 2 1111 2014-07-24 13:27:26 only x
The ore.datastoreSummary
function returns information about the R objects saved within a datastore in the user schema in the connected database. The function returns a data.frame
with columns that correspond to object name, object class, object size, and either the length of the object, if it is a vector
, or the number of rows and columns, if it is a data.frame
object. It takes one argument, the name of a datastore.
Example 2-20 demonstrates using the ore.datastoreSummary
function. The example uses the datastores created in Example 2-18.
Example 2-20 Using the ore.datastoreSummary Function
ore.datastoreSummary("ds1") ore.datastoreSummary("ds2")
Listing for Example 2-20
R> ore.datastoreSummary("ds1") object.name class size length row.count col.count 1 AIRQUALITY ore.frame 1077 6 153 6 2 df1 data.frame 328 2 5 2 3 df2 data.frame 328 2 5 2 4 iris_of ore.frame 1429 5 150 5 R> ore.datastoreSummary("ds2") object.name class size length row.count col.count 1 x numeric 182 20 NA NA 2 z ore.numeric 929 20 NA NA
The ore.load
function restores R objects saved in a datastore to the R global environment, .GlobalEnv
. The function returns a character vector that contains the names of the restored objects.
You can load all of the saved objects or you can use the list
argument to specify the objects to load. With the envir
argument, you can specify an environment in which to load objects.
Example 2-21 demonstrates using the ore.load
function to restore objects from datastores created in Example 2-19.
Example 2-21 Using the ore.load Function to Restore Objects from a Datastore
# We are in the same R session as Example 2-19. List the R objects. ls() # List the datastores. ore.datastore() # Delete the x and z objects. rm(x, z) ls() # Restore all of the objects in datastore ds2. ore.load("ds2") ls() # After ending the R session and starting another session. ls() # The datastore objects persist between sessions. ore.datastore() # Restore some of the objects from datastore ds1. ore.load("ds1", list = c("df1", "df2", "iris_of")) ls()
Listing for Example 2-21
R> # We are in the same R session as Example 2-19. List the R objects. R> ls() [1] "df1" "df2" "iris_of" "x" "y" "z" R> R> # List the datastores. R> ore.datastore() datastore.name object.count size creation.date description 1 another_ds 3 1243 2014-07-24 13:31:56 For pattern mattching 2 dfobj 2 656 2014-07-24 13:31:46 df objects 3 ds1 4 3162 2014-07-24 13:25:17 My datastore 4 ds2 2 1111 2014-07-24 13:27:26 only x R> R> # Delete the x and z objects. R> rm(x, z) R> ls() [1] "df1" "df2" "iris_of" "y" R> R> # Restore all of the objects in datastore ds2. R> ore.load("ds2") [1] "x" "z" R> R> ls() [1] "df1" "df2" "iris_of" "x" "y" "z" R> R> # After ending the R session and starting another session. R> ls() character(0) R> # The datastore objects persist between sessions. R> ore.datastore() datastore.name object.count size creation.date description 1 another_ds 3 1243 2014-07-24 13:31:56 For pattern mattching 2 dfobj 2 656 2014-07-24 13:31:46 df objects 3 ds1 4 3162 2014-07-24 13:25:17 My datastore 4 ds2 2 1111 2014-07-24 13:27:26 only x R> # Restore some of the objects from datastore ds1. R> ore.load("ds1", list = c("df1", "df2", "iris_of")) [1] "df1" "df2" "iris_of" R> ls() [1] "df1" "df2" "iris_of"
With the ore.delete
function, you can delete objects from an Oracle R Enterprise datastore or you can delete the datastore itself. To delete a datastore, you specify the name of it. To delete one or more objects from the datastore, you specify the list
argument. The ore.delete
function returns the name of the deleted objects or datastore.
Example 2-22 demonstrates using ore.delete
to delete an object from a datastore and then to delete the entire datastore. The example uses objects created in Example 2-18.
Example 2-22 Using the ore.delete Function
# Delete the df2 object from the ds1 . ore.delete("ds1", list = "df2") # Delete the datastore named ds1. ore.delete("ds1")
Listing for Example 2-22
R> # Delete the the df2 object from the ds1 datastore. R> ore.delete("ds1", list = "df2")[1] "df2" R> # Delete the datastore named ds1. R> ore.delete("ds1") [1] "ds1"
When you delete a datastore, Oracle R Enterprise discards all temporary database objects that were referenced by R objects in the deleted datastore. If you have saved an R object in more than one datastore, then Oracle R Enterprise discards a temporary database object only when no object in a datastore references the temporary database object.
Saving objects in a datastore makes it very easy to pass arguments to, and reference R objects with, embedded R execution functions. You can save objects that you create in one R session in a single datastore in the database. You can pass the name of this datastore to an embedded R function as an argument for loading within that function. You can use a datastore to easily pass one object or multiple objects.
See Also:
Chapter 6, "Using Oracle R Enterprise Embedded R Execution" for information on using the R and the SQL interfaces to embedded R execution