3 Get Started with Oracle Machine Learning for Python
Learn how to use OML4Py in Oracle Machine Learning Notebooks and how to move data between the local Python session and the database.
These actions are described in the following topics.
- Use OML4Py with Oracle Autonomous Database
OML4Py is available through the Python interpreter in Oracle Machine Learning Notebooks in Oracle Autonomous Database. - Move Data Between the Database and a Python Session
With OML4Py functions, you can interact with data structures in a database schema. - Save Python Objects in the Database
You can save Python objects in OML4Py datastores, which persist in the database.
3.1 Use OML4Py with Oracle Autonomous Database
OML4Py is available through the Python interpreter in Oracle Machine Learning Notebooks in Oracle Autonomous Database.
For more information, see Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle Machine Learning Notebooks.
Parent topic: Get Started with Oracle Machine Learning for Python
3.2 Move Data Between the Database and a Python Session
With OML4Py functions, you can interact with data structures in a database schema.
In your Python session, you can move data to and from the database and create temporary or persistent database tables. The OML4Py functions that perform these actions are described in the following topics.
- About Moving Data Between the Database and a Python Session
Using the functions described in this topic, you can move data between the your local Python session and an Oracle database schema. - Push Local Python Data to the Database
Use theoml.push
function to push data from your local Python session to a temporary table in your Oracle database schema. - Pull Data from the Database to a Local Python Session
Use thepull
method of anoml
proxy object to create a Python object in your local Python session. - Create a Python Proxy Object for a Database Object
Use theoml.sync
function to create a Python object as a proxy for a database table, view, or SQL statement. - Create a Persistent Database Table from a Python Data Set
Use theoml.create
function to create a persistent table in your database schema from data in your Python session.
Parent topic: Get Started with Oracle Machine Learning for Python
3.2.1 About Moving Data Between the Database and a Python Session
Using the functions described in this topic, you can move data between the your local Python session and an Oracle database schema.
The following functions create proxy oml
Python objects from database objects, create database tables from Python objects, list the objects in the workspace, and drop tables and views.
Function | Definition |
---|---|
oml.create |
Creates a persistent database table from a Python data set. |
oml.cursor |
Returns a cx_Oracle cursor object for the current OML4Py database connection.
|
oml.dir |
Returns the names of the |
oml.drop |
Drops a persistent database table or view. |
oml_object.pull |
Creates a local Python object that contains a copy of the database data referenced by the |
oml.push |
Pushes data from the OML Notebooks Python session memory into a temporary table in the database. |
oml.sync |
Creates an |
With the pull
method of an oml
object, you can create a local Python object that contains a copy of the database data represented by an oml
proxy object.
The oml.push
function implicitly coerces Python data types to oml
data types and the pull
method on oml
objects coerces oml
data types to Python data types.
With the oml.create
function, you can create a persistent database table and a corresponding oml.DataFrame
proxy object from a Python data set.
With the oml.sync
function, you can synchronize the metadata of a database table or view with the oml
object representing the database object.
With the oml.cursor
function, you can create a cx_Oracle
cursor
object for the current database connection. You can user the cursor
to run queries against the database, as shown in Example 3-6.
Parent topic: Move Data Between the Database and a Python Session
3.2.2 Push Local Python Data to the Database
Use the oml.push
function to push data from your local Python session to a temporary table in your Oracle database schema.
The oml.push
function creates a temporary table in the user’s database schema and inserts data into the table. It also creates and returns a corresponding proxy oml.DataFrame
object that references the table in the Python session. The table exists as long as an oml
object exists that references it, either in the Python session memory or in an OML4Py datastore.
The syntax of the oml.push
function is the following:
oml.push(x, oranumber=True, dbtypes=None)
The x
argument may be a pandas.DataFrame
or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on.
The SQL data types of the columns are determined by the following:
-
OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.
If the values in a column are all
None
, or if a column has inconsistent data types that are notNone
in the sampled rows, then a default column type cannot be determined and aValueError
is raised unless a SQL type for the column is specified by thedbtypes
argument. -
For numeric columns, the
oranumber
argument, which is abool
, determines the SQL data type. IfTrue
(the default), then the SQL data type isNUMBER
. IfFalse
, then the data type isBINARY_DOUBLE
.If the data in
x
containsNaN
values, then you should setoranumber
toFalse
. -
For string columns, the default type is
VARCHAR2(4000)
. -
For bytes columns, the default type is
BLOB
.
With the dbtypes
argument, you can specify the SQL data types for the table columns. The values of dbtypes
may be either a dict
that maps str
to str
values or a list of str
values. For a dict
, the keys are the names of the columns.
Example 3-1 Pushing Data to a Database Table
This example creates pd_df
, a pandas.core.frame.DataFrame
object with columns of various data types. It pushes pd_df
to a temporary database table, which creates the oml_df
object, which references the table. It then pulls the data from the oml_df
object to the df
object in local memory.
import oml
import pandas as pd
pd_df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, None],
'string' : [None, None, 'a', 'a', 'a', 'b'],
'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
# Push the data set to a database table with the specified dbtypes
# for each column.
oml_df = oml.push(pd_df, dbtypes = {'numeric': 'BINARY_DOUBLE',
'string':'CHAR(1)',
'bytes':'RAW(1)'})
# Display the data type of oml_df.
type(oml_df)
# Pull the data from oml_df into local memory.
df = oml_df.pull()
# Display the data type of df.
type(df)
# Create a list of tuples.
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
(3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
# Create an oml.DataFrame using the list.
oml_df2 = oml.push(lst, dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
type(oml_df2)
Listing for This Example
>>> import oml
>>> import pandas as pd
>>>
>>> pd_df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, None],
... 'string' : [None, None, 'a', 'a', 'a', 'b'],
... 'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>>
>>> # Push the data set to a database table with the specified dbtypes
... # for each column.
... oml_df = oml.push(pd_df, dbtypes = {'numeric': 'BINARY_DOUBLE',
... 'string':'CHAR(1)',
... 'bytes':'RAW(1)'})
>>>
>>> # Display the data type of oml_df.
... type(oml_df)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # Pull the data from oml_df into local memory.
... df = oml_df.pull()
>>>
>>> # Display the data type of df.
... type(df)
<class 'pandas.core.frame.DataFrame'>
>>>
>>> # Create a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
... (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>>
>>> # Create an oml.DataFrame using the list.
... oml_df2 = oml.push(lst, dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>>
>>> type(oml_df2)
<class 'oml.core.frame.DataFrame'>
Parent topic: Move Data Between the Database and a Python Session
3.2.3 Pull Data from the Database to a Local Python Session
Use the pull
method of an oml
proxy object to create a Python object in your local Python session.
Note:
You can pull data to a localpandas.DataFrame
only if the data can fit into the local Python session memory. Also, even if the data fits in memory but is still very large, you may not be able to perform many, or any, Python functions in the local Python session.
Example 3-2 Pulling Data into Local Memory
This example loads the iris data set and creates the IRIS database table and the oml_iris
proxy object that references that table. It displays the type of the oml_iris
object, then pulls the data from it to the iris
object in local memory and displays its type.
import oml
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = [‘SEPAL_LENGTH’,‘SEPAL_WIDTH’, ‘PETAL_LENGTH’,‘PETAL_WIDTH’])
y = pd.DataFrame(list(map(lambda x: {0: ‘setosa’, 1: ‘versicolor’, 2:‘virginica’}[x], iris.target)), columns = [‘SPECIES’])
iris_df = pd.concat([x, y], axis=1)
oml_iris = oml.create(iris_df, table = ‘IRIS’)
# Display the data type of oml_iris.
type(oml_iris)
# Pull the data from oml_iris into local memory.
iris = oml_iris.pull()
# Display the data type of iris.
type(iris)
# Drop the IRIS database table.
oml.drop('IRIS')
Listing for This Example
>>> import oml
>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
>>> iris = datasets.load_iris()
>>> iris = load_iris()
>>> x = pd.DataFrame(iris.data, columns = [‘SEPAL_LENGTH’,‘SEPAL_WIDTH’, ‘PETAL_LENGTH’,‘PETAL_WIDTH’])
>>> y = pd.DataFrame(list(map(lambda x: {0: ‘setosa’, 1: ‘versicolor’, 2:‘virginica’}[x], iris.target)), columns = [‘SPECIES’])
>>> iris_df = pd.concat([x, y], axis=1)
>>> oml_iris = oml.create(iris_df, table = ‘IRIS’)
>>>
>>> # Display the data type of oml_iris.
... type(oml_iris)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # Pull the data from oml_iris into local memory.
... iris = oml_iris.pull()
>>>
>>> # Display the data type of iris.
... type(iris)
<class 'pandas.core.frame.DataFrame'>
>>>
>>> # Drop the IRIS database table.
... oml.drop('IRIS')
Parent topic: Move Data Between the Database and a Python Session
3.2.4 Create a Python Proxy Object for a Database Object
Use the oml.sync
function to create a Python object as a proxy for a database table, view, or SQL statement.
The oml.sync
function returns an oml.DataFrame
object or a dictionary of oml.DataFrame
objects. The oml.DataFrame
object returned by oml.sync
is a proxy for the database object.
You can use the proxy oml.DataFrame
object to select data from the table. When you run a Python function that selects data from the table, the function returns the current data from the database object. However, if some application has added a column to the table, or has otherwise changed the metadata of the database object, the oml.DataFrame
proxy object does not reflect such a change until you again invoke oml.sync
for the database object.
Tip:
To conserve memory resources and save time, you should only create proxies for the tables that you want to use in your Python session.
You can use the oml.dir
function to list the oml.DataFrame
proxy objects in the environment for a schema.
The syntax of the oml.sync
function is the following:
oml.sync(schema=None, regex_match=False, table=None, view=None, query=None)
The schema
argument in oml.sync
specifies the
name of the schema where the database object exists. If schema=None
, which
is the default, then the current schema is used.
To create an oml.DataFrame
object for a table, use the table
parameter. To create one for a view, use the view
parameter. To create one for a SQL SELECT
statement, use the query
parameter. You can only specify one of these parameters in an oml.sync
invocation: the argument for one of the parameters must be a string and the argument for each of the other two parameters must be None
.
Creating a proxy object for a query enables you to create an oml.DataFrame
object without creating a view in the database. This can be useful when you do not have the CREATE VIEW
system privilege for the current schema. You cannot use the schema
parameter and the query
parameter in the same ore.sync
invocation.
With the regex_match
argument, you can specify whether the value of the table
or view
argument is a regular expression. If regex_match=True
, then oml.sync
creates oml.DataFrame
objects for each database object that matches the pattern. The matched tables or views are returned in a dict
with the table or view names as keys.
Example 3-3 Creating a Python Object for a Database Table
This example creates an oml.DataFrame
Python object as a proxy for a database table. For this example, the table COFFEE exists in the user's schema.
import oml
# Create the Python object oml_coffee as a proxy for the
# database table COFFEE.
oml_coffee = oml.sync(table = 'COFFEE')
type(oml_coffee)
# List the proxy objects in the schema.
oml.dir()
oml_coffee.head()
Listing for This Example
>>> import oml
>>>
>>> # Create the Python object oml_coffee as a proxy for the
... # database table COFFEE.
... oml_coffee = oml.sync(table = 'COFFEE')
>>> type(oml_coffee)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # List the proxy objects in the schema.
... oml.dir()
['oml_coffee']
>>>
>>> oml_coffee.head()
ID COFFEE WINDOW
0 1 esp w
1 2 cap d
2 3 cap w
3 4 kon w
4 5 ice w
Example 3-4 Using the regex_match Argument
This example uses the regex_match
argument in creating a dict
object that contains oml.DataFrame
proxy objects for tables whose names start with C. For this example, the COFFEE and COLOR tables exist in the user's schema and are the only tables whose names start with C.
# Create a dict of oml.DataFrame proxy objects for tables
# whose names start with 'C'.
oml_cdat = oml.sync(table="^C", regex_match=True)
oml_cdat.keys()
oml_cdat['COFFEE'].columns
oml_cdat['COLOR'].columns
Listing for This Example
>>> # Create a dict of oml.DataFrame proxy objects for tables
... # whose names start with 'C'.
... oml_cdat = oml.sync(table="^C", regex_match=True)
>>>
>>> oml_cdat.keys()
dict_keys(['COFFEE', 'COLOR']
>>> oml_cdat['COFFEE'].columns
['ID', 'COFFEE', 'WINDOW']
>>> oml_cdat['COLOR'].columns
['REGION', 'EYES', 'HAIR', 'COUNT']
Example 3-5 Synchronizing an Updated Table
This example uses oml.sync
to create an oml.DataFrame
for the database table COFFEE. For the example, the new column BREW has been added to the database table by some other database process after the first invocation of oml.sync
. Invoking oml.sync
again synchronizes the metadata of the oml.DataFrame
with those of the table.
oml_coffee = oml.sync(table = "COFFEE")
oml_coffee.columns
# After a new column has been inserted into the table.
oml_coffee = oml.sync(table = "COFFEE")
oml_coffee.columns
Listing for This Example
>>> oml_coffee = oml.sync(table = "COFFEE")
>>> oml_coffee.columns
['ID', 'COFFEE', 'WINDOW']
>>>
>>> # After a new column has been inserted into the table.
... oml_coffee = oml.sync(table = "COFFEE")
>>> oml_coffee.columns
['ID', 'COFFEE', 'WINDOW', 'BREW']
Parent topic: Move Data Between the Database and a Python Session
3.2.5 Create a Persistent Database Table from a Python Data Set
Use the oml.create
function to create a persistent table in your database schema from data in your Python session.
The oml.create
function creates a table in the database schema and returns an oml.DataFrame
object that is a proxy for the table. The proxy oml.DataFrame
object has the same name as the table.
Note:
When creating a table in Oracle Machine Learning for Python, if you use lowercase or mixed case for the name of the table, then you must use the same lowercase or mixed case name in double quotation marks when using the table in a SQL query or function. If, instead, you use an all uppercase name when creating the table, then the table name is case-insensitive: you can use uppercase, lowercase, or mixed case when using the table without using double quotation marks. The same is true for naming columns in a table.You can delete the persistent table in a database schema with the oml.drop
function.
Caution:
Use theoml.drop
function to delete a persistent database table. Use the del
statement to remove an oml.DataFrame
proxy object and its associated temporary table; del
does not delete a persistent table.
The syntax of the oml.create
function is the following:
oml.create(x, table, oranumber=True, dbtypes=None, append=False)
The x
argument is a pandas.DataFrame
or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on. The table
argument is a string that specifies a name for the table.
The SQL data types of the columns are determined by the following:
-
OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.
If the values in a column are all
None
, or if a column has inconsistent data types that are notNone
in the sampled rows, then a default column type cannot be determined and aValueError
is raised unless a SQL type for the column is specified by thedbtypes
argument. -
For numeric columns, the
oranumber
argument, which is abool
, determines the SQL data type. IfTrue
(the default), then the SQL data type is NUMBER. IfFalse
, then the data type is BINARY DOUBLE.If the data in
x
contains NaN values, then you should setoranumber
toFalse
. -
For string columns, the default type is VARCHAR2(4000).
-
For bytes columns, the default type is BLOB.
With the dbtypes
parameter, you can specify the SQL data types for the table columns. The values of dbtypes
may be either a dict
that maps str
to str
values or a list of str
values. For a dict
, the keys are the names of the columns. The dbtypes
parameter is ignored if the append
argument is True
.
The append
argument is a bool
that specifies whether to append the x
data to an existing table.
Example 3-6 Creating Database Tables from a Python Data Set
This example creates a cursor
object for the database connection, creates a pandas.core.frame.DataFrame
with columns of various data types, then creates a series of tables using different oml.create
parameters and shows the SQL data types of the table columns.
import oml
# Create a cursor object for the current OML4Py database
# connection to run queries and get information from the database.
cr = oml.cursor()
import pandas as pd
df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
'string' : [None, None, 'a', 'a', 'a', 'b'],
'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
# Get the order of the columns
df.columns
# Create a table with the default parameters.
oml_df1 = oml.create(df, table = 'tbl1')
# Show the default SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
cr.fetchall()
# Create a table with oranumber set to False.
oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
# Show the SQL data typea of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
cr.fetchall()
# Create a table with dbtypes specified as a dict mapping column names
# to SQL data types.
oml_df3 = oml.create(df, table = 'tbl3',
dbtypes = {'numeric': 'BINARY_DOUBLE',
'bytes':'RAW(1)'})
# Show the SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
cr.fetchall()
# Create a table with dbtypes specified as a list of SQL data types
# matching the order of the columns.
oml_df4 = oml.create(df, table = 'tbl4',
dbtypes = ['BINARY_DOUBLE','VARCHAR2','RAW(1)'])
# Show the SQL data type of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
cr.fetchall()
# Create a table from a list of tuples.
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
(3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
oml_df5 = oml.create(lst, table = 'tbl5',
dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
# Close the cursor
cr.close()
# Drop the tables.
oml.drop('tbl1')
oml.drop('tbl2')
oml.drop('tbl3')
oml.drop('tbl4')
oml.drop('tbl5')
Listing for This Example
>>> import oml
>>>
>>> # Create a cursor object for the current OML4Py database
... # connection to run queries and get information from the database.
... cr = oml.cursor()
>>>
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
... 'string' : [None, None, 'a', 'a', 'a', 'b'],
... 'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>>
>>> # Get the order of the columns.
... df.columns
Index(['numeric', 'string', 'bytes'], dtype='object')
>>>
>>> # Create a table with the default parameters.
... oml_df1 = oml.create(df, table = 'tbl1')
>>>
>>> # Show the default SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
>>> cr.fetchall()
[('NUMBER',), ('VARCHAR2',), ('BLOB',)]
>>>
>>> # Create a table with oranumber set to False.
... oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
>>>
>>> # Show the SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('BLOB',)]
>>>
>>> # Create a table with dbtypes specified as a dict mapping column names
... # to SQL data types.
... oml_df3 = oml.create(df, table = 'tbl3',
... dbtypes = {'numeric': 'BINARY_DOUBLE',
... 'bytes':'RAW(1)'})
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>>
>>> # Create a table with dbtypes specified as a list of SQL data types
... # matching the order of the columns.
... oml_df4 = oml.create(df, table = 'tbl4',
... dbtypes = ['BINARY_DOUBLE','CHAR(1)', 'RAW(1)'])
>>>
>>> # Show the SQL data type of the columns
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Create a table from a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
... (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> oml_df5 = oml.create(lst, table ='tbl5',
... dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl5'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Close the cursor.
... cr.close()
>>>
>>> # Drop the tables
... oml.drop('tbl1')
>>> oml.drop('tbl2')
>>> oml.drop('tbl3')
>>> oml.drop('tbl4')
>>> oml.drop('tbl5')
Parent topic: Move Data Between the Database and a Python Session
3.3 Save Python Objects in the Database
You can save Python objects in OML4Py datastores, which persist in the database.
You can grant or revoke read privilege access to a datastore or its objects to one or more users. You can restore the saved objects in another Python session.
The following topics describe the OML4Py functions for creating and managing datastores:
- About OML4Py Datastores
In an OML4Py datastore, you can store Python objects, which you can then use in subsequent Python sessions; you can also make them available to other users or programs. - Save Objects to a Datastore
Theoml.ds.save
function saves one or more Python objects to a datastore. - Load Saved Objects From a Datastore
Theoml.ds.load
function loads one or more Python objects from a datastore into a Python session. - Get Information About Datastores
Theoml.ds.dir
function provides information about datastores. - Get Information About Datastore Objects
Theoml.ds.describe
function provides information about the objects in a datastore. - Delete Datastore Objects
Theoml.ds.delete
function deletes datastores or objects in a datastore. - Manage Access to Stored Objects
Theoml.grant
andoml.revoke
functions grant or revoke the read privilege to datastores or to user-defined Python functions in the script repository.
Parent topic: Get Started with Oracle Machine Learning for Python
3.3.1 About OML4Py Datastores
In an OML4Py datastore, you can store Python objects, which you can then use in subsequent Python sessions; you can also make them available to other users or programs.
Python objects, including OML4Py proxy objects, exist only for the duration of the current Python session unless you explicitly save them. You can save a Python object, including oml
proxy objects, to a named datastore and then load that object in a later Python session, including an Embedded Python Execution session. OML4Py creates the datastore in the user’s database schema. A datastore, and the objects it contains, persist in the database until you delete them.
You can grant or revoke read privilege permission to another user to a datastore that you created or to objects in a datastore.
OML4Py has Python functions for managing objects in a datastore. It also has PL/SQL procedures for granting or revoking the read privilege and database views for listing available datastores and their contents.
Using a datastore, you can do the following:
-
Save OML4Py and other Python objects that you create in one Python session and load them in another Python session.
-
Pass arguments to Python functions for use in Embedded Python Execution.
-
Pass objects for use in Embedded Python Execution. You could, for example, use the
oml.glm
class to build an Oracle Machine Learning model and save it in a datastore. You could then use that model to score data in the database through Embedded Python Execution.
Python Interface for Datastores
The following table lists the Python functions for saving and managing objects in a datastore.
Function | Description |
---|---|
oml.ds.delete |
Deletes one or more datastores or Python objects from a datastore. |
oml.ds.dir |
Lists the datastores available to the current user. |
oml.ds.load |
Loads Python objects from a datastore into the user’s session. |
oml.ds.save |
Saves Python objects to a named datastore in the user’s database schema. |
The following table lists the Python functions for managing access to datastores and datastore objects.
Function | Description |
---|---|
oml.grant |
Grants read privilege permission to another user to a datastore or a user-defined Python function in the script repository owned by the current user. |
oml.revoke |
Revokes the read privilege permission that was granted to another user to a datastore or a user-defined Python function in the script repository owned by the current user. |
Parent topic: Save Python Objects in the Database
3.3.2 Save Objects to a Datastore
The oml.ds.save
function saves one or more Python objects to a datastore.
OML4Py creates the datastore in the current user’s schema.
The syntax of oml.ds.save
is the following:
oml.ds.save(objs, name, description=' ', grantable=None,
overwrite=False, append=False, compression=False)
The objs
argument is a dict
that contains the name and object pairs to save to the datastore specified by the name
argument.
With the description
argument, you can provide some descriptive text that appears when you get information about the datastore. The description
parameter has no effect when used with the append
parameter.
With the grantable
argument, you can specify whether the read privilege to the datastore may be granted to other users.
If you set the overwrite
argument to TRUE
, then you can replace an existing datastore with another datastore of the same name.
If you set the append
argument to TRUE
, then you can add objects to an existing datastore. The overwrite
and append
arguments are mutually exclusive.
If you set compression to True
, then the serialized Python objects are compressed in the datastore.
Example 3-7 Saving Python Objects to a Datastore
This example demonstrates creating datastores.
import oml
from sklearn import datasets
from sklearn import linear_model
import pandas as pd
# Load three data sets and create oml.DataFrame objects for them.
wine = datasets.load_wine()
x = pd.DataFrame(wine.data, columns = wine.feature_names)
y = pd.DataFrame(wine.target, columns = ['Class'])
# Create the database table WINE.
oml_wine = oml.create(pd.concat([x, y], axis=1), table = 'WINE')
oml_wine.columns
diabetes = datasets.load_diabetes()
x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.DataFrame(diabetes.target, columns=['disease_progression'])
oml_diabetes = oml.create(pd.concat([x, y], axis=1),
table = "DIABETES")
oml_diabetes.columns
boston = datasets.load_boston()
x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
y = pd.DataFrame(boston.target, columns = ['Value'])
oml_boston = oml.create(pd.concat([x, y], axis=1), table = "BOSTON")
oml_boston.columns
# Save the wine Bunch object to the datastore directly,
# along with the oml.DataFrame proxy object for the BOSTON table.
oml.ds.save(objs={'wine':wine, 'oml_boston':oml_boston},
name="ds_pydata", description = "python datasets")
# Save the oml_diabetes proxy object to an existing datastore.
oml.ds.save(objs={'oml_diabetes':oml_diabetes},
name="ds_pydata", append=True)
# Save the oml_wine proxy object to another datastore.
oml.ds.save(objs={'oml_wine':oml_wine},
name="ds_wine_data", description = "wine dataset")
# Create regression models using sklearn and oml.
# The regr1 linear model is a native Python object.
regr1 = linear_model.LinearRegression()
regr1.fit(boston.data, boston.target)
# The regr2 GLM model is an oml object.
regr2 = oml.glm("regression")
X = oml_boston.drop('Value')
y = oml_boston['Value']
regr2 = regr2.fit(X, y)
# Save the native Python object and the oml proxy object to a datastore
# and allow the read privilege to be granted to them.
oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
name="ds_pymodel", grantable=True)
# Grant the read privilege to the datastore to every user.
oml.grant(name="ds_pymodel", typ="datastore", user=None)
# List the datastores to which the read privilege has been granted.
oml.ds.dir(dstype="grant")
Listing for This Example
>>> import oml
>>> from sklearn import datasets
>>> from sklearn import linear_model
>>> import pandas as pd
>>>
>>> # Load three data sets and create oml.DataFrame objects for them.
>>> wine = datasets.load_wine()
>>> x = pd.DataFrame(wine.data, columns = wine.feature_names)
>>> y = pd.DataFrame(wine.target, columns = ['Class'])
>>>
>>> # Create the database table WINE.
... oml_wine = oml.create(pd.concat([x, y], axis=1), table = 'WINE')
>>> oml_wine.columns
['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline', 'Class']
>>>
>>> diabetes = datasets.load_diabetes()
>>> x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
>>> y = pd.DataFrame(diabetes.target, columns=['disease_progression'])
>>> oml_diabetes = oml.create(pd.concat([x, y], axis=1),
... table = "DIABETES")
>>> oml_diabetes.columns
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'disease_progression']
>>>
>>> boston = datasets.load_boston()
>>> x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
>>> y = pd.DataFrame(boston.target, columns = ['Value'])
>>> oml_boston = oml.create(pd.concat([x, y], axis=1), table = "BOSTON")
>>> oml_boston.columns
['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'Value']
>>>
>>> # Save the wine Bunch object to the datastore directly,
... # along with the oml.DataFrame proxy object for the BOSTON table.
... oml.ds.save(objs={'wine':wine, 'oml_boston':oml_boston},
... name="ds_pydata", description = "python datasets")
>>>
>>> # Save the oml_diabetes proxy object to an existing datastore.
... oml.ds.save(objs={'oml_diabetes':oml_diabetes},
... name="ds_pydata", append=True)
>>>
>>> # Save the oml_wine proxy object to another datastore.
... oml.ds.save(objs={'oml_wine':oml_wine},
... name="ds_wine_data", description = "wine dataset")
>>>
>>> # Create regression models using sklearn and oml.
... # The regr1 linear model is a native Python object.
... regr1 = linear_model.LinearRegression()
>>> regr1.fit(boston.data, boston.target)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> # The regr2 GLM model is an oml proxy object.
... regr2 = oml.glm("regression")
>>> X = oml_boston.drop('Value')
>>> y = oml_boston['Value']
>>> regr2 = regr2.fit(X, y)
>>>
>>> # Save the native Python object and the oml proxy object to a datastore
... # and allow the read privilege to be granted to them.
... oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
... name="ds_pymodel", grantable=True)
>>>
>>> # Grant the read privilege to the ds_pymodel datastore to every user.
... oml.grant(name="ds_pymodel", typ="datastore", user=None)
>>>
>>> # List the datastores to which the read privilege has been granted.
... oml.ds.dir(dstype="grant")
datastore_name grantee
0 ds_pymodel PUBLIC
Parent topic: Save Python Objects in the Database
3.3.3 Load Saved Objects From a Datastore
The oml.ds.load
function loads one or more Python objects from a datastore into a Python session.
The syntax of oml.ds.load
is the following:
oml.ds.load(name, objs=None, owner=None, to_globals=True)
The name
argument specifies the datastore that contains the objects to load.
With the objs
argument, you identify a specific object or a list of objects to load.
With the boolean to_globals
parameter, you can specify whether the objects are loaded to a global workspace or to a dictionary object. If the argument to to_globals
is True
, then oml.ds.load
function loads the objects into the global workspace. If the argument is False
, then the function returns a dict
object that contains pairs of object names and values.
The oml.ds.load
function raises a ValueError
if the name
argument is an empty string or if the owner of the datastore is not the current user and the read privilege for the datastore has not been granted to the current user.
Example 3-8 Loading Objects from Datastores
This example loads objects from datastores. For the creation of the datastores used in this example, see Example 3-7.
import oml
# Load all Python objects from a datastore to the global workspace.
sorted(oml.ds.load(name="ds_pydata"))
# Load the named Python object from the datastore to the global workspace.
oml.ds.load(name="ds_pymodel", objs=["regr2"])
# Load the named Python object from the datastore to the user's workspace.
oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)
Listing for This Example
>>> import oml
>>>
>>> # Load all Python objects from a datastore to the current workspace.
... sorted(oml.ds.load(name="ds_pydata"))
['oml_boston', 'oml_diabetes', 'wine']
>>>
>>> # Load the named Python object from the datastore to the global workspace.
... oml.ds.load(name="ds_pymodel", objs=["regr2"])
['regr2']
>>>
>>> # Load the named Python object from the datastore to the user's workspace.
... oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)
{'regr1': LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)}
Parent topic: Save Python Objects in the Database
3.3.4 Get Information About Datastores
The oml.ds.dir
function provides information about datastores.
The syntax of oml.ds.dir
is the following:
oml.ds.dir(name=None, regex_match=False, dstype=’user’)
Use the name
parameter to get information about a specific datastore.
Optionally, you can use the regex_match
and dstype
parameters to get information about datastores with certain characteristics. The valid arguments for dstype
are the following:
Argument | Description |
---|---|
all |
Lists all of the datastores to which the current user has the read privilege. |
grant |
Lists the datastores for which the current user has granted read privilege to other users. |
granted |
Lists the datastores for which other users have granted read privilege to the current user. |
grantable |
Lists the datastores that the current user can grant the read privilege to. |
user |
Lists the datastores created by current user. |
private |
Lists the datastores that the current user cannot grant the read privileges to. |
The oml.ds.dir
function returns a pandas.DataFrame
object that contains different columns depending on which dstype
argument you use. The following table lists the arguments and the columns returned for the values supplied.
dstype Argument | Columns in the DataFrame Returned |
---|---|
|
DSNAME, which contains the datastore name NOBJ, which contains the number of objects in the datastore DSIZE, which contains the size in bytes of each object in the datastore CDATE, which contains the creation date of the datastore DESCRIPTION, which contains the optional description of the datastore |
|
All of the columns returned by the DSOWNER, which contains the owner of the datastore |
|
DSNAME, which contains the datastore name GRANTEE, which contains the name of the user to which the read privilege to the datastore has been granted by the current session user |
Example 3-9 Getting Information About Datastores
This example demonstrates using different combinations of arguments to the oml.ds.dir
function. It demonstrates using oml.dir
to list some or all of the datastores. For the creation of the datastores used in this example, see Example 3-7.
import oml
# Show all saved datastores.
oml.ds.dir(dstype="all")[['owner', 'datastore_name', 'object_count']]
# Show datastores to which other users have been granted the read
# privilege.
oml.ds.dir(dstype="grant")
# Show datastores whose names match a pattern.
oml.ds.dir(name='pydata', regex_match=True)\
[['datastore_name', 'object_count']]
Listing for This Example
>>> import oml
>>>
>>> # Show all saved datastores.
... oml.ds.dir(dstype="all")[['owner', 'datastore_name', 'object_count']]
owner datastore_name object_count
0 OML_USER ds_pydata 3
1 OML_USER ds_pymodel 2
2 OML_USER ds_wine_data 1
>>>
>>> # Show datastores to which other users have been granted the read
>>> # privilege.
... oml.ds.dir(dstype="grant")
datastore_name grantee
0 ds_pymodel PUBLIC
>>>
>>> oml.ds.dir(name='pydata', regex_match=True)\
... [['datastore_name', 'object_count']]
datastore_name object_count
0 ds_pydata 3
Parent topic: Save Python Objects in the Database
3.3.5 Get Information About Datastore Objects
The oml.ds.describe
function provides information about the objects in a datastore.
The syntax of oml.ds.describe
is the following:
oml.ds.describe(name, owner=None))
The name
argument is a string that specifies the name of a datastore.
The owner
argument is a string that specifies the owner of the datastore or None
(the default). If you do not specify the owner, then the function returns information about the datastore if it is owned by the current user.
The oml.ds.describe
function returns a pandas.DataFrame
object, each row of which represents an object in the datastore. The columns of the DataFrame
are the following:
-
object_name, which specifies the name of the object
-
class, which specifies the class of the object
-
size, which specifies the size of the object in bytes
-
length, which specifies the length of the object
-
row_count, which specifies the rows of the object
-
col_count, which specifies the columns of the object
This function raises a ValueError
if the following occur:
-
The current user is not the owner of the datastore and has not been granted read privilege for the datastore.
-
The datastore does not exist.
Example 3-10 Getting Information About Datastore Objects
This example demonstrates the using the oml.ds.describe
function. For the creation of the datastore used in this example, see Example 3-7.
import oml
# Describe the contents of the ds_pydata datastore.
oml.ds.describe(name='ds_pydata')
oml.ds.describe(name="ds_pydata")[['object_name', 'class']]
Listing for This Example
>>> import oml
>>>
>>> # Describe the contents of the ds_pydata datastore.
... oml.ds.describe(name='ds_pydata')
object_name class size length row_count col_count
0 oml_boston oml.DataFrame 1073 506 506 14
1 oml_diabetes oml.DataFrame 964 442 442 11
2 wine Bunch 24177 5 1 5
>>> oml.ds.describe(name="ds_pydata")[['object_name', 'class']]
object_name class
0 oml_boston oml.DataFrame
1 oml_diabetes oml.DataFrame
2 wine Bunch
Parent topic: Save Python Objects in the Database
3.3.6 Delete Datastore Objects
The oml.ds.delete
function deletes datastores or objects in a datastore.
Use the oml.ds.delete
function to delete one or more datastores in your database schema or to delete objects in a datastore.
The syntax of oml.ds.delete
is the following:
oml.ds.delete(name, objs=None, regex_match=False)
The argument to the name
parameter may be one of the following:
-
A string that specifies the name of the datastore to modify or delete, or a regular expression that matches the datastores to delete.
-
A list of
str
objects that name the datastores from which to delete objects.
The objs
parameter specifies the objects to delete from a datastore. The argument to the objs
parameter may be one of the following:
-
A string that specifies the object to delete from one or more datastores, or a regular expression that matches the objects to delete.
-
None
(the default), which deletes the entire datastore or datastores.
The regex_match
parameter is a bool
that indicates whether the name
or objs
arguments are regular expressions. The default value is False
. The regex_match
parameter operates as follows:
-
If
regex_match=False
and ifname
is notNone
, and:-
If
objs=None
, thenoml.ds.delete
deletes the datastore or datastores specified in thename
argument. -
If you specify one or more datastores with the
name
argument and one or more datastore objects with theobjs
argument, thenoml.ds.delete
deletes the specified Python objects from the datastores.
-
-
If
regex_match=True
and:-
If
objs=None
, thenoml.ds.delete
deletes the datastores you specified in thename
argument. -
If the
name
argument is a string and you specify one or more datastore objects with theobjs
argument, thenoml.ds.delete
deletes from the datastore the objects whose names match the regular expression specified in theobjs
argument. -
If the
name
argument is a list ofstr
objects, then theobjs
argument must be a list ofstr
objects of the same length asname
, andoml.ds.delete
deletes from the datastores the objects whose names match the regular expressions specified inobjs
.
-
This function raises an error if the following occur:
-
A specified datastore does not exist.
-
Argument
regex_match
isFalse
and argumentname
is a list ofstr
objects larger than 1 and argumentobjs
is notNone
. -
Argument
regex_match
isTrue
and argumentsname
andobjs
are lists that are not the same length.
Example 3-11 Deleting Datastore Objects
This example demonstrates the using the oml.ds.delete
function. For the creation of the datastores used in this example, see Example 3-7.
import oml
# Show the existing datastores.
oml.ds.dir()
# Show the Python objects in the ds_pydata datastore.
oml.ds.describe(name='ds_pydata')
# Delete some objects from the datastore.
oml.ds.delete(name="ds_pydata", objs=["wine", "oml_boston"])
# Delete a datastore.
oml.ds.delete(name="ds_pydata")
# Delete all datastores whose names match a pattern.
oml.ds.delete(name="_pymodel", regex_match=True)
# Show the existing datastores again.
oml.ds.dir()
Listing for This Example
>>> import oml
>>>
>>> # Show the existing datastores.
... oml.ds.dir()
datastore_name object_count size date description
0 ds_pydata 3 26214 2019-05-18 21:04:06 python datasets
1 ds_pymodel 2 6370 2019-05-18 21:08:18 None
2 ds_wine_data 1 1410 2019-05-18 21:06:53 wine dataset
>>>
>>> # Show the Python objects in the ds_pydata datastore.
... oml.ds.describe(name='ds_pydata')
object_name class size length row_count col_count
0 oml_boston oml.DataFrame 1073 506 506 14
1 oml_diabetes oml.DataFrame 964 442 442 11
2 wine Bunch 24177 5 1 5
>>>
>>> # Delete some objects from a datastore.
... oml.ds.delete(name="ds_pydata", objs=["wine", "oml_boston"])
{'wine', 'oml_boston'}
>>>
>>> # Delete a datastore.
... oml.ds.delete(name="ds_pydata")
'ds_pydata'
>>>
>>> # Delete all datastores whose names match a pattern.
... oml.ds.delete(name="_pymodel", regex_match=True)
{'ds_pymodel'}
>>>
>>> # Show the existing datastores again.
... oml.ds.dir()
datastore_name object_count size date description
0 ds_wine_data 1 1410 2019-05-18 21:06:53 wine dataset
Parent topic: Save Python Objects in the Database
3.3.7 Manage Access to Stored Objects
The oml.grant
and oml.revoke
functions grant or revoke the read privilege to datastores or to user-defined Python functions in the script repository.
The oml.grant
function grants the read privilege to another user to a datastore or to a user-defined Python function in the OML4Py script repository. The oml.revoke
function revokes that privilege.
The syntax of these functions is the following:
oml.grant(name, typ='datastore', user=None)
oml.revoke(name, typ='datastore', user=None)
The name
argument is a string that specifies the name of the user-defined Python function in the script repository or the name of a datastore.
The typ
parameter must be specified. The argument is a string that is either ‘datastore’
or ‘pyqscript’
.
The user
argument is a string that specifies the user to whom read privilege to the named datastore or user-defined Python function is granted or from whom it is revoked, or None
(the default). If you specify None
, then the read privilege is granted to or revoked from all users.
Example 3-12 Granting and Revoking Access to Datastores
This example displays the datastores to which the read privilege has been granted to all users. It revokes read privilege from the ds_pymodel
datastore and displays the datastores with public read privilege again. It next grants the read privilege to the user SH and finally displays once more the datastores to which read privilege has been granted. For the creation of the datastores used in this example, see Example 3-7.
import oml
# Show datastores to which other users have been granted read privilege.
oml.ds.dir(dstype="grant")
# Revoke the read privilege from every user.
oml.revoke(name="ds_pymodel", typ="datastore", user=None)
# Again show datastores to which read privilege has been granted.
oml.ds.dir(dstype="grant")
# Grant the read privilege to the user SH.
oml.grant(name="ds_pymodel", typ="datastore", user="SH")
oml.ds.dir(dstype="grant")
Listing for This Example
>>> import oml
>>>
>>> # Show datastores to which other users have been granted read privilege.
... oml.ds.dir(dstype="grant")
datastore_name grantee
0 ds_pymodel PUBLIC
>>>
>>> # Revoke the read privilege from every user.
... oml.revoke(name="ds_pymodel", typ="datastore", user=None)
>>>
>>> # Again show datastores to which read privilege has been granted to other users.
... oml.ds.dir(dstype="grant")
Empty DataFrame
Columns: [datastore_name, grantee]
Index: []
>>>
>>> # Grant the read privilege to the user SH.
... oml.grant(name="ds_pymodel", typ="datastore", user="SH")
>>>
>>> oml.ds.dir(dstype="grant")
datastore_name grantee
0 ds_pymodel SH
Example 3-13 Granting and Revoking Access to User-Defined Python Functions
This example grants the read privilege to the MYLM user-defined Python function to the user SH and then revokes that privilege. For the creation of the user-defined Python functions used in this example, see Example 7-11.
# List the user-defined Python functions available only to the current user.
oml.script.dir(sctype='user')
# Grant the read privilege to the MYLM user-defined Python function to the user SH.
oml.grant(name="MYLM", typ="pyqscript", user="SH")
# List the user-defined Python functions to which read privilege has been granted.
oml.script.dir(sctype="grant")
# Revoke the read privilege to the MYLM user-defined Python function from the user SH.
oml.revoke(name="MYLM", typ="pyqscript", user="SH")
# List the granted user-defined Python functions again to see if the revocation was successful.
oml.script.dir(sctype="grant")
Listing for This Example
>>> # List the user-defined Python functions available only to the current user.
oml.script.dir(sctype='user')
name script
0 MYLM def build_lm1(dat):\n from sklearn import lin...
>>>
>>># Grant the read privilege to the MYLM user-defined Python function to the user SH.
...oml.grant(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the user-defined Python functions to which read privilege has been granted.
... oml.script.dir(sctype="grant")
name grantee
0 MYLM SH
>>>
>>> # Revoke the read privilege to the MYLM user-defined Python function from the user SH.
... oml.revoke(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the granted user-defined Python functions again to see if the revocation was successful.
... oml.script.dir(sctype="grant")
Empty DataFrame
Columns: [name, grantee]
Index: []
Parent topic: Save Python Objects in the Database