3 Get Started with Oracle Machine Learning for Python

Learn how to use OML4Py in Oracle Machine Learning Notebooks and how to move data between the local Python session and the database.

These actions are described in the following topics.

3.1 Use OML4Py with Oracle Autonomous Database

OML4Py is available through the Python interpreter in Oracle Machine Learning Notebooks in Oracle Autonomous Database.

For more information, see Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle Machine Learning Notebooks.

3.2 Move Data Between the Database and a Python Session

With OML4Py functions, you can interact with data structures in a database schema.

In your Python session, you can move data to and from the database and create temporary or persistent database tables. The OML4Py functions that perform these actions are described in the following topics.

3.2.1 About Moving Data Between the Database and a Python Session

Using the functions described in this topic, you can move data between the your local Python session and an Oracle database schema.

The following functions create proxy oml Python objects from database objects, create database tables from Python objects, list the objects in the workspace, and drop tables and views.

Function Definition
oml.create

Creates a persistent database table from a Python data set.

oml.cursor Returns a cx_Oracle cursor object for the current OML4Py database connection.
oml.dir

Returns the names of the oml objects in the workspace.

oml.drop

Drops a persistent database table or view.

oml_object.pull

Creates a local Python object that contains a copy of the database data referenced by the oml object.

oml.push

Pushes data from the OML Notebooks Python session memory into a temporary table in the database.

oml.sync

Creates an oml.DataFrame proxy object in Python that represents a database table, view, or query.

With the pull method of an oml object, you can create a local Python object that contains a copy of the database data represented by an oml proxy object.

The oml.push function implicitly coerces Python data types to oml data types and the pull method on oml objects coerces oml data types to Python data types.

With the oml.create function, you can create a persistent database table and a corresponding oml.DataFrame proxy object from a Python data set.

With the oml.sync function, you can synchronize the metadata of a database table or view with the oml object representing the database object.

With the oml.cursor function, you can create a cx_Oracle cursor object for the current database connection. You can user the cursor to run queries against the database, as shown in Example 3-6.

3.2.2 Push Local Python Data to the Database

Use the oml.push function to push data from your local Python session to a temporary table in your Oracle database schema.

The oml.push function creates a temporary table in the user’s database schema and inserts data into the table. It also creates and returns a corresponding proxy oml.DataFrame object that references the table in the Python session. The table exists as long as an oml object exists that references it, either in the Python session memory or in an OML4Py datastore.

The syntax of the oml.push function is the following:

oml.push(x, oranumber=True, dbtypes=None)

The x argument may be a pandas.DataFrame or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on.

The SQL data types of the columns are determined by the following:

  • OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.

    If the values in a column are all None, or if a column has inconsistent data types that are not None in the sampled rows, then a default column type cannot be determined and a ValueError is raised unless a SQL type for the column is specified by the dbtypes argument.

  • For numeric columns, the oranumber argument, which is a bool, determines the SQL data type. If True (the default), then the SQL data type is NUMBER. If False, then the data type is BINARY_DOUBLE.

    If the data in x contains NaN values, then you should set oranumber to False.

  • For string columns, the default type is VARCHAR2(4000).

  • For bytes columns, the default type is BLOB.

With the dbtypes argument, you can specify the SQL data types for the table columns. The values of dbtypes may be either a dict that maps str to str values or a list of str values. For a dict, the keys are the names of the columns.

Example 3-1 Pushing Data to a Database Table

This example creates pd_df, a pandas.core.frame.DataFrame object with columns of various data types. It pushes pd_df to a temporary database table, which creates the oml_df object, which references the table. It then pulls the data from the oml_df object to the df object in local memory.

import oml
import pandas as pd

pd_df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, None],
                      'string' : [None, None, 'a', 'a', 'a', 'b'],
                      'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})

# Push the data set to a database table with the specified dbtypes 
# for each column.
oml_df = oml.push(pd_df, dbtypes = {'numeric': 'BINARY_DOUBLE', 
                                    'string':'CHAR(1)', 
                                    'bytes':'RAW(1)'})

# Display the data type of oml_df.
type(oml_df)

# Pull the data from oml_df into local memory.
df = oml_df.pull()

# Display the data type of df.
type(df)

# Create a list of tuples. 
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
       (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
       
# Create an oml.DataFrame using the list.       
oml_df2 = oml.push(lst, dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])

type(oml_df2)

Listing for This Example

>>> import oml
>>> import pandas as pd
>>>
>>> pd_df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, None],
...                       'string' : [None, None, 'a', 'a', 'a', 'b'],
...                       'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>> 
>>> # Push the data set to a database table with the specified dbtypes
... # for each column.
... oml_df = oml.push(pd_df, dbtypes = {'numeric': 'BINARY_DOUBLE',
...                                     'string':'CHAR(1)', 
...                                     'bytes':'RAW(1)'})
>>> 
>>> # Display the data type of oml_df.
... type(oml_df)
<class 'oml.core.frame.DataFrame'>
>>> 
>>> # Pull the data from oml_df into local memory.
... df = oml_df.pull()
>>> 
>>> # Display the data type of df.
... type(df)
<class 'pandas.core.frame.DataFrame'>
>>>
>>> # Create a list of tuples. 
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
...        (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> 
>>> # Create an oml.DataFrame using the list.       
... oml_df2 = oml.push(lst, dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>> 
>>> type(oml_df2)
<class 'oml.core.frame.DataFrame'>

3.2.3 Pull Data from the Database to a Local Python Session

Use the pull method of an oml proxy object to create a Python object in your local Python session.

Note:

You can pull data to a local pandas.DataFrame only if the data can fit into the local Python session memory. Also, even if the data fits in memory but is still very large, you may not be able to perform many, or any, Python functions in the local Python session.

Example 3-2 Pulling Data into Local Memory

This example loads the iris data set and creates the IRIS database table and the oml_iris proxy object that references that table. It displays the type of the oml_iris object, then pulls the data from it to the iris object in local memory and displays its type.


import oml
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
x = pd.DataFrame(iris.data, columns = [‘SEPAL_LENGTH’,‘SEPAL_WIDTH’, ‘PETAL_LENGTH’,‘PETAL_WIDTH’])
y = pd.DataFrame(list(map(lambda x: {0: ‘setosa’, 1: ‘versicolor’, 2:‘virginica’}[x], iris.target)), columns = [‘SPECIES’])
iris_df = pd.concat([x, y], axis=1)

oml_iris = oml.create(iris_df, table = ‘IRIS’)

# Display the data type of oml_iris.
type(oml_iris)

# Pull the data from oml_iris into local memory.
iris = oml_iris.pull()

# Display the data type of iris.
type(iris)

# Drop the IRIS database table.
oml.drop('IRIS')

Listing for This Example

>>> import oml
>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
>>> iris = datasets.load_iris()

>>> iris = load_iris()
>>> x = pd.DataFrame(iris.data, columns = [‘SEPAL_LENGTH’,‘SEPAL_WIDTH’, ‘PETAL_LENGTH’,‘PETAL_WIDTH’])
>>> y = pd.DataFrame(list(map(lambda x: {0: ‘setosa’, 1: ‘versicolor’, 2:‘virginica’}[x], iris.target)), columns = [‘SPECIES’])
>>> iris_df = pd.concat([x, y], axis=1)

>>> oml_iris = oml.create(iris_df, table = ‘IRIS’)

>>>
>>> # Display the data type of oml_iris.
... type(oml_iris)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # Pull the data from oml_iris into local memory.
... iris = oml_iris.pull()
>>>
>>> # Display the data type of iris.
... type(iris)
<class 'pandas.core.frame.DataFrame'>
>>>
>>> # Drop the IRIS database table.
... oml.drop('IRIS')

3.2.4 Create a Python Proxy Object for a Database Object

Use the oml.sync function to create a Python object as a proxy for a database table, view, or SQL statement.

The oml.sync function returns an oml.DataFrame object or a dictionary of oml.DataFrame objects. The oml.DataFrame object returned by oml.sync is a proxy for the database object.

You can use the proxy oml.DataFrame object to select data from the table. When you run a Python function that selects data from the table, the function returns the current data from the database object. However, if some application has added a column to the table, or has otherwise changed the metadata of the database object, the oml.DataFrame proxy object does not reflect such a change until you again invoke oml.sync for the database object.

Tip:

To conserve memory resources and save time, you should only create proxies for the tables that you want to use in your Python session.

You can use the oml.dir function to list the oml.DataFrame proxy objects in the environment for a schema.

The syntax of the oml.sync function is the following:

oml.sync(schema=None, regex_match=False, table=None, view=None, query=None)

The schema argument in oml.sync specifies the name of the schema where the database object exists. If schema=None, which is the default, then the current schema is used.

To create an oml.DataFrame object for a table, use the table parameter. To create one for a view, use the view parameter. To create one for a SQL SELECT statement, use the query parameter. You can only specify one of these parameters in an oml.sync invocation: the argument for one of the parameters must be a string and the argument for each of the other two parameters must be None.

Creating a proxy object for a query enables you to create an oml.DataFrame object without creating a view in the database. This can be useful when you do not have the CREATE VIEW system privilege for the current schema. You cannot use the schema parameter and the query parameter in the same ore.sync invocation.

With the regex_match argument, you can specify whether the value of the table or view argument is a regular expression. If regex_match=True, then oml.sync creates oml.DataFrame objects for each database object that matches the pattern. The matched tables or views are returned in a dict with the table or view names as keys.

Example 3-3 Creating a Python Object for a Database Table

This example creates an oml.DataFrame Python object as a proxy for a database table. For this example, the table COFFEE exists in the user's schema.

import oml

# Create the Python object oml_coffee as a proxy for the 
# database table COFFEE.
oml_coffee = oml.sync(table = 'COFFEE')
type(oml_coffee)

# List the proxy objects in the schema.
oml.dir()

oml_coffee.head()

Listing for This Example

>>> import oml
>>>
>>> # Create the Python object oml_coffee as a proxy for the 
... # database table COFFEE.
... oml_coffee = oml.sync(table = 'COFFEE')
>>> type(oml_coffee)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # List the proxy objects in the schema.
... oml.dir()
['oml_coffee']
>>>
>>> oml_coffee.head()
   ID COFFEE WINDOW
0   1    esp      w
1   2    cap      d
2   3    cap      w
3   4    kon      w
4   5    ice      w

Example 3-4 Using the regex_match Argument

This example uses the regex_match argument in creating a dict object that contains oml.DataFrame proxy objects for tables whose names start with C. For this example, the COFFEE and COLOR tables exist in the user's schema and are the only tables whose names start with C.

# Create a dict of oml.DataFrame proxy objects for tables
# whose names start with 'C'.
oml_cdat = oml.sync(table="^C", regex_match=True)

oml_cdat.keys()
oml_cdat['COFFEE'].columns
oml_cdat['COLOR'].columns

Listing for This Example

>>> # Create a dict of oml.DataFrame proxy objects for tables
... # whose names start with 'C'.
... oml_cdat = oml.sync(table="^C", regex_match=True)
>>>
>>> oml_cdat.keys()           
dict_keys(['COFFEE', 'COLOR']
>>> oml_cdat['COFFEE'].columns
['ID', 'COFFEE', 'WINDOW']
>>> oml_cdat['COLOR'].columns
['REGION', 'EYES', 'HAIR', 'COUNT']

Example 3-5 Synchronizing an Updated Table

This example uses oml.sync to create an oml.DataFrame for the database table COFFEE. For the example, the new column BREW has been added to the database table by some other database process after the first invocation of oml.sync. Invoking oml.sync again synchronizes the metadata of the oml.DataFrame with those of the table.

oml_coffee =  oml.sync(table = "COFFEE")
oml_coffee.columns

# After a new column has been inserted into the table.
oml_coffee =  oml.sync(table = "COFFEE")
oml_coffee.columns

Listing for This Example

>>> oml_coffee =  oml.sync(table = "COFFEE")
>>> oml_coffee.columns
['ID', 'COFFEE', 'WINDOW']
>>>
>>> # After a new column has been inserted into the table.
... oml_coffee =  oml.sync(table = "COFFEE")
>>> oml_coffee.columns
['ID', 'COFFEE', 'WINDOW', 'BREW']

3.2.5 Create a Persistent Database Table from a Python Data Set

Use the oml.create function to create a persistent table in your database schema from data in your Python session.

The oml.create function creates a table in the database schema and returns an oml.DataFrame object that is a proxy for the table. The proxy oml.DataFrame object has the same name as the table.

Note:

When creating a table in Oracle Machine Learning for Python, if you use lowercase or mixed case for the name of the table, then you must use the same lowercase or mixed case name in double quotation marks when using the table in a SQL query or function. If, instead, you use an all uppercase name when creating the table, then the table name is case-insensitive: you can use uppercase, lowercase, or mixed case when using the table without using double quotation marks. The same is true for naming columns in a table.

You can delete the persistent table in a database schema with the oml.drop function.

Caution:

Use the oml.drop function to delete a persistent database table. Use the del statement to remove an oml.DataFrame proxy object and its associated temporary table; del does not delete a persistent table.

The syntax of the oml.create function is the following:

oml.create(x, table, oranumber=True, dbtypes=None, append=False)

The x argument is a pandas.DataFrame or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on. The table argument is a string that specifies a name for the table.

The SQL data types of the columns are determined by the following:

  • OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.

    If the values in a column are all None, or if a column has inconsistent data types that are not None in the sampled rows, then a default column type cannot be determined and a ValueError is raised unless a SQL type for the column is specified by the dbtypes argument.

  • For numeric columns, the oranumber argument, which is a bool, determines the SQL data type. If True (the default), then the SQL data type is NUMBER. If False, then the data type is BINARY DOUBLE.

    If the data in x contains NaN values, then you should set oranumber to False.

  • For string columns, the default type is VARCHAR2(4000).

  • For bytes columns, the default type is BLOB.

With the dbtypes parameter, you can specify the SQL data types for the table columns. The values of dbtypes may be either a dict that maps str to str values or a list of str values. For a dict, the keys are the names of the columns. The dbtypes parameter is ignored if the append argument is True.

The append argument is a bool that specifies whether to append the x data to an existing table.

Example 3-6 Creating Database Tables from a Python Data Set

This example creates a cursor object for the database connection, creates a pandas.core.frame.DataFrame with columns of various data types, then creates a series of tables using different oml.create parameters and shows the SQL data types of the table columns.

import oml

# Create a cursor object for the current OML4Py database 
# connection to run queries and get information from the database.
cr = oml.cursor()

import pandas as pd

df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
                   'string' : [None, None, 'a', 'a', 'a', 'b'],
                   'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})

# Get the order of the columns
df.columns

# Create a table with the default parameters.
oml_df1 = oml.create(df, table = 'tbl1')

# Show the default SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
cr.fetchall()

# Create a table with oranumber set to False.
oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)

# Show the SQL data typea of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
cr.fetchall()

# Create a table with dbtypes specified as a dict mapping column names 
# to SQL data types.
oml_df3 = oml.create(df, table = 'tbl3', 
                     dbtypes = {'numeric': 'BINARY_DOUBLE',
                                'bytes':'RAW(1)'})

# Show the SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
cr.fetchall()

# Create a table with dbtypes specified as a list of SQL data types  
# matching the order of the columns.
oml_df4 = oml.create(df, table = 'tbl4', 
                     dbtypes = ['BINARY_DOUBLE','VARCHAR2','RAW(1)'])

# Show the SQL data type of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
cr.fetchall()

# Create a table from a list of tuples.
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'), 
       (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
oml_df5 = oml.create(lst, table = 'tbl5',
                     dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])

# Close the cursor
cr.close()

# Drop the tables.
oml.drop('tbl1')
oml.drop('tbl2')
oml.drop('tbl3')
oml.drop('tbl4')
oml.drop('tbl5')

Listing for This Example

>>> import oml
>>>
>>> # Create a cursor object for the current OML4Py database
... # connection to run queries and get information from the database.
... cr = oml.cursor()
>>> 
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
...                    'string' : [None, None, 'a', 'a', 'a', 'b'],
...                    'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>>
>>> # Get the order of the columns.
... df.columns
Index(['numeric', 'string', 'bytes'], dtype='object')
>>> 
>>> # Create a table with the default parameters.
... oml_df1 = oml.create(df, table = 'tbl1')
>>>
>>> # Show the default SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
>>> cr.fetchall()
[('NUMBER',), ('VARCHAR2',), ('BLOB',)]
>>> 
>>> # Create a table with oranumber set to False.
... oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
>>>
>>> # Show the SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('BLOB',)]
>>> 
>>> # Create a table with dbtypes specified as a dict mapping column names 
... # to SQL data types.
... oml_df3 = oml.create(df, table = 'tbl3', 
...                      dbtypes = {'numeric': 'BINARY_DOUBLE',
...                                 'bytes':'RAW(1)'})
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>> 
>>> # Create a table with dbtypes specified as a list of SQL data types 
... # matching the order of the columns.
... oml_df4 = oml.create(df, table = 'tbl4', 
...                      dbtypes = ['BINARY_DOUBLE','CHAR(1)', 'RAW(1)'])
>>>
>>> # Show the SQL data type of the columns
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>> 
>>> # Create a table from a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'), 
...        (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> oml_df5 = oml.create(lst, table ='tbl5', 
...                      dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>> 
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl5'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Close the cursor.
... cr.close()
>>>
>>> # Drop the tables
... oml.drop('tbl1')
>>> oml.drop('tbl2')
>>> oml.drop('tbl3')
>>> oml.drop('tbl4')
>>> oml.drop('tbl5')

3.3 Save Python Objects in the Database

You can save Python objects in OML4Py datastores, which persist in the database.

You can grant or revoke read privilege access to a datastore or its objects to one or more users. You can restore the saved objects in another Python session.

The following topics describe the OML4Py functions for creating and managing datastores:

3.3.1 About OML4Py Datastores

In an OML4Py datastore, you can store Python objects, which you can then use in subsequent Python sessions; you can also make them available to other users or programs.

Python objects, including OML4Py proxy objects, exist only for the duration of the current Python session unless you explicitly save them. You can save a Python object, including oml proxy objects, to a named datastore and then load that object in a later Python session, including an Embedded Python Execution session. OML4Py creates the datastore in the user’s database schema. A datastore, and the objects it contains, persist in the database until you delete them.

You can grant or revoke read privilege permission to another user to a datastore that you created or to objects in a datastore.

OML4Py has Python functions for managing objects in a datastore. It also has PL/SQL procedures for granting or revoking the read privilege and database views for listing available datastores and their contents.

Using a datastore, you can do the following:

  • Save OML4Py and other Python objects that you create in one Python session and load them in another Python session.

  • Pass arguments to Python functions for use in Embedded Python Execution.

  • Pass objects for use in Embedded Python Execution. You could, for example, use the oml.glm class to build an Oracle Machine Learning model and save it in a datastore. You could then use that model to score data in the database through Embedded Python Execution.

Python Interface for Datastores

The following table lists the Python functions for saving and managing objects in a datastore.

Function Description
oml.ds.delete

Deletes one or more datastores or Python objects from a datastore.

oml.ds.dir

Lists the datastores available to the current user.

oml.ds.load

Loads Python objects from a datastore into the user’s session.

oml.ds.save

Saves Python objects to a named datastore in the user’s database schema.

The following table lists the Python functions for managing access to datastores and datastore objects.

Function Description
oml.grant

Grants read privilege permission to another user to a datastore or a user-defined Python function in the script repository owned by the current user.

oml.revoke

Revokes the read privilege permission that was granted to another user to a datastore or a user-defined Python function in the script repository owned by the current user.

3.3.2 Save Objects to a Datastore

The oml.ds.save function saves one or more Python objects to a datastore.

OML4Py creates the datastore in the current user’s schema.

The syntax of oml.ds.save is the following:

oml.ds.save(objs, name, description=' ', grantable=None,
                      overwrite=False, append=False, compression=False)

The objs argument is a dict that contains the name and object pairs to save to the datastore specified by the name argument.

With the description argument, you can provide some descriptive text that appears when you get information about the datastore. The description parameter has no effect when used with the append parameter.

With the grantable argument, you can specify whether the read privilege to the datastore may be granted to other users.

If you set the overwrite argument to TRUE, then you can replace an existing datastore with another datastore of the same name.

If you set the append argument to TRUE, then you can add objects to an existing datastore. The overwrite and append arguments are mutually exclusive.

If you set compression to True, then the serialized Python objects are compressed in the datastore.

Example 3-7 Saving Python Objects to a Datastore

This example demonstrates creating datastores.

import oml
from sklearn import datasets
from sklearn import linear_model
import pandas as pd

# Load three data sets and create oml.DataFrame objects for them.
wine = datasets.load_wine()
x = pd.DataFrame(wine.data, columns = wine.feature_names)
y = pd.DataFrame(wine.target, columns = ['Class'])

# Create the database table WINE.
oml_wine = oml.create(pd.concat([x, y], axis=1), table = 'WINE')
oml_wine.columns

diabetes = datasets.load_diabetes()
x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.DataFrame(diabetes.target, columns=['disease_progression'])
oml_diabetes = oml.create(pd.concat([x, y], axis=1), 
                                    table = "DIABETES")
oml_diabetes.columns

boston = datasets.load_boston()
x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
y = pd.DataFrame(boston.target, columns = ['Value'])
oml_boston = oml.create(pd.concat([x, y], axis=1), table = "BOSTON")
oml_boston.columns

# Save the wine Bunch object to the datastore directly, 
# along with the oml.DataFrame proxy object for the BOSTON table.
oml.ds.save(objs={'wine':wine, 'oml_boston':oml_boston},
            name="ds_pydata", description = "python datasets")

# Save the oml_diabetes proxy object to an existing datastore.
oml.ds.save(objs={'oml_diabetes':oml_diabetes},
                  name="ds_pydata", append=True)

# Save the oml_wine proxy object to another datastore.
oml.ds.save(objs={'oml_wine':oml_wine},
            name="ds_wine_data", description = "wine dataset")

# Create regression models using sklearn and oml.
# The regr1 linear model is a native Python object.
regr1 = linear_model.LinearRegression()
regr1.fit(boston.data, boston.target)
# The regr2 GLM model is an oml object.
regr2 = oml.glm("regression")
X = oml_boston.drop('Value')
y = oml_boston['Value']
regr2 = regr2.fit(X, y)

# Save the native Python object and the oml proxy object to a datastore
# and allow the read privilege to be granted to them.
oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
            name="ds_pymodel", grantable=True)

# Grant the read privilege to the datastore to every user.
oml.grant(name="ds_pymodel", typ="datastore", user=None)

# List the datastores to which the read privilege has been granted.
oml.ds.dir(dstype="grant")

Listing for This Example

>>> import oml
>>> from sklearn import datasets
>>> from sklearn import linear_model
>>> import pandas as pd
>>>
>>> # Load three data sets and create oml.DataFrame objects for them.
>>> wine = datasets.load_wine()
>>> x = pd.DataFrame(wine.data, columns = wine.feature_names)
>>> y = pd.DataFrame(wine.target, columns = ['Class'])
>>> 
>>> # Create the database table WINE.
... oml_wine = oml.create(pd.concat([x, y], axis=1), table = 'WINE')
>>> oml_wine.columns
['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline', 'Class']
>>>
>>> diabetes = datasets.load_diabetes()
>>> x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
>>> y = pd.DataFrame(diabetes.target, columns=['disease_progression'])
>>> oml_diabetes = oml.create(pd.concat([x, y], axis=1), 
...                           table = "DIABETES")
>>> oml_diabetes.columns
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'disease_progression']
>>>
>>> boston = datasets.load_boston()
>>> x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
>>> y = pd.DataFrame(boston.target, columns = ['Value'])
>>> oml_boston = oml.create(pd.concat([x, y], axis=1), table = "BOSTON")
>>> oml_boston.columns
['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'Value']
>>>
>>> # Save the wine Bunch object to the datastore directly, 
... # along with the oml.DataFrame proxy object for the BOSTON table.
... oml.ds.save(objs={'wine':wine, 'oml_boston':oml_boston},
...             name="ds_pydata", description = "python datasets")
>>>
>>> # Save the oml_diabetes proxy object to an existing datastore.                           
... oml.ds.save(objs={'oml_diabetes':oml_diabetes},
...                    name="ds_pydata", append=True)
>>>
>>> # Save the oml_wine proxy object to another datastore.
... oml.ds.save(objs={'oml_wine':oml_wine},
...             name="ds_wine_data", description = "wine dataset")
>>>
>>> # Create regression models using sklearn and oml.
... # The regr1 linear model is a native Python object.
... regr1 = linear_model.LinearRegression()          
>>> regr1.fit(boston.data, boston.target)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> # The regr2 GLM model is an oml proxy object.
... regr2 = oml.glm("regression")
>>> X = oml_boston.drop('Value')
>>> y = oml_boston['Value']
>>> regr2 = regr2.fit(X, y)
>>>
>>> # Save the native Python object and the oml proxy object to a datastore
... # and allow the read privilege to be granted to them.
... oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
...             name="ds_pymodel", grantable=True)
>>>
>>> # Grant the read privilege to the ds_pymodel datastore to every user.
... oml.grant(name="ds_pymodel", typ="datastore", user=None)
>>>
>>> # List the datastores to which the read privilege has been granted.
... oml.ds.dir(dstype="grant")
  datastore_name grantee
0     ds_pymodel  PUBLIC

3.3.3 Load Saved Objects From a Datastore

The oml.ds.load function loads one or more Python objects from a datastore into a Python session.

The syntax of oml.ds.load is the following:

oml.ds.load(name, objs=None, owner=None, to_globals=True)

The name argument specifies the datastore that contains the objects to load.

With the objs argument, you identify a specific object or a list of objects to load.

With the boolean to_globals parameter, you can specify whether the objects are loaded to a global workspace or to a dictionary object. If the argument to to_globals is True, then oml.ds.load function loads the objects into the global workspace. If the argument is False, then the function returns a dict object that contains pairs of object names and values.

The oml.ds.load function raises a ValueError if the name argument is an empty string or if the owner of the datastore is not the current user and the read privilege for the datastore has not been granted to the current user.

Example 3-8 Loading Objects from Datastores

This example loads objects from datastores. For the creation of the datastores used in this example, see Example 3-7.

import oml

# Load all Python objects from a datastore to the global workspace.
sorted(oml.ds.load(name="ds_pydata"))

# Load the named Python object from the datastore to the global workspace.
oml.ds.load(name="ds_pymodel", objs=["regr2"])

# Load the named Python object from the datastore to the user's workspace.
oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)

Listing for This Example

>>> import oml
>>>
>>> # Load all Python objects from a datastore to the current workspace.
... sorted(oml.ds.load(name="ds_pydata"))
['oml_boston', 'oml_diabetes', 'wine']
>>>
>>> # Load the named Python object from the datastore to the global workspace.
... oml.ds.load(name="ds_pymodel", objs=["regr2"])
['regr2']
>>> 
>>> # Load the named Python object from the datastore to the user's workspace.
... oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)
{'regr1': LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)}

3.3.4 Get Information About Datastores

The oml.ds.dir function provides information about datastores.

The syntax of oml.ds.dir is the following:

oml.ds.dir(name=None, regex_match=False, dstype=’user’)

Use the name parameter to get information about a specific datastore.

Optionally, you can use the regex_match and dstype parameters to get information about datastores with certain characteristics. The valid arguments for dstype are the following:

Argument Description
all

Lists all of the datastores to which the current user has the read privilege.

grant

Lists the datastores for which the current user has granted read privilege to other users.

granted

Lists the datastores for which other users have granted read privilege to the current user.

grantable

Lists the datastores that the current user can grant the read privilege to.

user

Lists the datastores created by current user.

private

Lists the datastores that the current user cannot grant the read privileges to.

The oml.ds.dir function returns a pandas.DataFrame object that contains different columns depending on which dstype argument you use. The following table lists the arguments and the columns returned for the values supplied.

dstype Argument Columns in the DataFrame Returned

user

private

grantable

DSNAME, which contains the datastore name

NOBJ, which contains the number of objects in the datastore

DSIZE, which contains the size in bytes of each object in the datastore

CDATE, which contains the creation date of the datastore

DESCRIPTION, which contains the optional description of the datastore

all

granted

All of the columns returned by the user, private, and grantable values, plus this additional column:

DSOWNER, which contains the owner of the datastore

grant

DSNAME, which contains the datastore name

GRANTEE, which contains the name of the user to which the read privilege to the datastore has been granted by the current session user

Example 3-9 Getting Information About Datastores

This example demonstrates using different combinations of arguments to the oml.ds.dir function. It demonstrates using oml.dir to list some or all of the datastores. For the creation of the datastores used in this example, see Example 3-7.

import oml

# Show all saved datastores.
oml.ds.dir(dstype="all")[['owner', 'datastore_name', 'object_count']]

# Show datastores to which other users have been granted the read 
# privilege.
oml.ds.dir(dstype="grant")

# Show datastores whose names match a pattern.
oml.ds.dir(name='pydata', regex_match=True)\
          [['datastore_name', 'object_count']]

Listing for This Example

>>> import oml
>>>
>>> # Show all saved datastores.
... oml.ds.dir(dstype="all")[['owner', 'datastore_name', 'object_count']]
     owner  datastore_name  object_count
0  OML_USER      ds_pydata             3
1  OML_USER     ds_pymodel             2
2  OML_USER   ds_wine_data             1
>>>
>>> # Show datastores to which other users have been granted the read 
>>> # privilege.
... oml.ds.dir(dstype="grant")
  datastore_name grantee
0     ds_pymodel  PUBLIC
>>>
>>> oml.ds.dir(name='pydata', regex_match=True)\
...           [['datastore_name', 'object_count']]
  datastore_name  object_count
0      ds_pydata             3

3.3.5 Get Information About Datastore Objects

The oml.ds.describe function provides information about the objects in a datastore.

The syntax of oml.ds.describe is the following:

oml.ds.describe(name, owner=None))

The name argument is a string that specifies the name of a datastore.

The owner argument is a string that specifies the owner of the datastore or None (the default). If you do not specify the owner, then the function returns information about the datastore if it is owned by the current user.

The oml.ds.describe function returns a pandas.DataFrame object, each row of which represents an object in the datastore. The columns of the DataFrame are the following:

  • object_name, which specifies the name of the object

  • class, which specifies the class of the object

  • size, which specifies the size of the object in bytes

  • length, which specifies the length of the object

  • row_count, which specifies the rows of the object

  • col_count, which specifies the columns of the object

This function raises a ValueError if the following occur:

  • The current user is not the owner of the datastore and has not been granted read privilege for the datastore.

  • The datastore does not exist.

Example 3-10 Getting Information About Datastore Objects

This example demonstrates the using the oml.ds.describe function. For the creation of the datastore used in this example, see Example 3-7.

import oml

# Describe the contents of the ds_pydata datastore.
oml.ds.describe(name='ds_pydata')
oml.ds.describe(name="ds_pydata")[['object_name', 'class']]

Listing for This Example

>>> import oml
>>>
>>> # Describe the contents of the ds_pydata datastore.
... oml.ds.describe(name='ds_pydata')
    object_name          class   size  length  row_count  col_count
0    oml_boston  oml.DataFrame   1073     506        506         14
1  oml_diabetes  oml.DataFrame    964     442        442         11
2          wine          Bunch  24177       5          1          5
>>> oml.ds.describe(name="ds_pydata")[['object_name', 'class']]
    object_name          class
0    oml_boston  oml.DataFrame
1  oml_diabetes  oml.DataFrame
2          wine          Bunch

3.3.6 Delete Datastore Objects

The oml.ds.delete function deletes datastores or objects in a datastore.

Use the oml.ds.delete function to delete one or more datastores in your database schema or to delete objects in a datastore.

The syntax of oml.ds.delete is the following:

oml.ds.delete(name, objs=None, regex_match=False)

The argument to the name parameter may be one of the following:

  • A string that specifies the name of the datastore to modify or delete, or a regular expression that matches the datastores to delete.

  • A list of str objects that name the datastores from which to delete objects.

The objs parameter specifies the objects to delete from a datastore. The argument to the objs parameter may be one of the following:

  • A string that specifies the object to delete from one or more datastores, or a regular expression that matches the objects to delete.

  • None (the default), which deletes the entire datastore or datastores.

The regex_match parameter is a bool that indicates whether the name or objs arguments are regular expressions. The default value is False. The regex_match parameter operates as follows:

  • If regex_match=False and if name is not None, and:

    • If objs=None, then oml.ds.delete deletes the datastore or datastores specified in the name argument.

    • If you specify one or more datastores with the name argument and one or more datastore objects with the objs argument, then oml.ds.delete deletes the specified Python objects from the datastores.

  • If regex_match=True and:

    • If objs=None, then oml.ds.delete deletes the datastores you specified in the name argument.

    • If the name argument is a string and you specify one or more datastore objects with the objs argument, then oml.ds.delete deletes from the datastore the objects whose names match the regular expression specified in the objs argument.

    • If the name argument is a list of str objects, then the objs argument must be a list of str objects of the same length as name, and oml.ds.delete deletes from the datastores the objects whose names match the regular expressions specified in objs.

This function raises an error if the following occur:

  • A specified datastore does not exist.

  • Argument regex_match is False and argument name is a list of str objects larger than 1 and argument objs is not None.

  • Argument regex_match is True and arguments name and objs are lists that are not the same length.

Example 3-11 Deleting Datastore Objects

This example demonstrates the using the oml.ds.delete function. For the creation of the datastores used in this example, see Example 3-7.

import oml

# Show the existing datastores.
oml.ds.dir()

# Show the Python objects in the ds_pydata datastore.
oml.ds.describe(name='ds_pydata')

# Delete some objects from the datastore.
oml.ds.delete(name="ds_pydata", objs=["wine", "oml_boston"])

# Delete a datastore.
oml.ds.delete(name="ds_pydata")

# Delete all datastores whose names match a pattern.
oml.ds.delete(name="_pymodel", regex_match=True)

# Show the existing datastores again.
oml.ds.dir()

Listing for This Example

>>> import oml
>>> 
>>> # Show the existing datastores.
... oml.ds.dir()
  datastore_name  object_count  size                date      description
0      ds_pydata             3  26214 2019-05-18 21:04:06  python datasets
1     ds_pymodel             2   6370 2019-05-18 21:08:18             None
2   ds_wine_data             1   1410 2019-05-18 21:06:53     wine dataset
>>>
>>> # Show the Python objects in the ds_pydata datastore.
... oml.ds.describe(name='ds_pydata')
    object_name          class  size  length  row_count  col_count
0    oml_boston  oml.DataFrame   1073     506        506         14
1  oml_diabetes  oml.DataFrame    964     442        442         11
2          wine          Bunch  24177       5          1          5
>>> 
>>> # Delete some objects from a datastore.
... oml.ds.delete(name="ds_pydata", objs=["wine", "oml_boston"])
{'wine', 'oml_boston'}
>>>
>>> # Delete a datastore.
... oml.ds.delete(name="ds_pydata")
'ds_pydata'
>>> 
>>> # Delete all datastores whose names match a pattern.
... oml.ds.delete(name="_pymodel", regex_match=True)
{'ds_pymodel'}
>>> 
>>> # Show the existing datastores again.
... oml.ds.dir()
  datastore_name  object_count  size                date   description
0   ds_wine_data             1  1410 2019-05-18 21:06:53  wine dataset

3.3.7 Manage Access to Stored Objects

The oml.grant and oml.revoke functions grant or revoke the read privilege to datastores or to user-defined Python functions in the script repository.

The oml.grant function grants the read privilege to another user to a datastore or to a user-defined Python function in the OML4Py script repository. The oml.revoke function revokes that privilege.

The syntax of these functions is the following:

oml.grant(name, typ='datastore', user=None)
oml.revoke(name, typ='datastore', user=None)

The name argument is a string that specifies the name of the user-defined Python function in the script repository or the name of a datastore.

The typ parameter must be specified. The argument is a string that is either ‘datastore’ or ‘pyqscript’.

The user argument is a string that specifies the user to whom read privilege to the named datastore or user-defined Python function is granted or from whom it is revoked, or None (the default). If you specify None, then the read privilege is granted to or revoked from all users.

Example 3-12 Granting and Revoking Access to Datastores

This example displays the datastores to which the read privilege has been granted to all users. It revokes read privilege from the ds_pymodel datastore and displays the datastores with public read privilege again. It next grants the read privilege to the user SH and finally displays once more the datastores to which read privilege has been granted. For the creation of the datastores used in this example, see Example 3-7.

import oml

# Show datastores to which other users have been granted read privilege.
oml.ds.dir(dstype="grant")

# Revoke the read privilege from every user.
oml.revoke(name="ds_pymodel", typ="datastore", user=None)

# Again show datastores to which read privilege has been granted.
oml.ds.dir(dstype="grant")

# Grant the read privilege to the user SH.
oml.grant(name="ds_pymodel", typ="datastore", user="SH")

oml.ds.dir(dstype="grant")

Listing for This Example

>>> import oml
>>> 
>>> # Show datastores to which other users have been granted read privilege.
... oml.ds.dir(dstype="grant")
  datastore_name grantee
0     ds_pymodel  PUBLIC
>>>
>>> # Revoke the read privilege from every user.
... oml.revoke(name="ds_pymodel", typ="datastore", user=None)
>>>
>>> # Again show datastores to which read privilege has been granted to other users.
... oml.ds.dir(dstype="grant")
Empty DataFrame
Columns: [datastore_name, grantee]
Index: []
>>>
>>> # Grant the read privilege to the user SH.
... oml.grant(name="ds_pymodel", typ="datastore", user="SH")
>>> 
>>> oml.ds.dir(dstype="grant")
  datastore_name grantee
0     ds_pymodel      SH

Example 3-13 Granting and Revoking Access to User-Defined Python Functions

This example grants the read privilege to the MYLM user-defined Python function to the user SH and then revokes that privilege. For the creation of the user-defined Python functions used in this example, see Example 7-11.

# List the user-defined Python functions available only to the current user.
oml.script.dir(sctype='user')

# Grant the read privilege to the MYLM user-defined Python function to the user SH.
oml.grant(name="MYLM", typ="pyqscript", user="SH")

# List the user-defined Python functions to which read privilege has been granted.
oml.script.dir(sctype="grant")

# Revoke the read privilege to the MYLM user-defined Python function from the user SH.
oml.revoke(name="MYLM", typ="pyqscript", user="SH")

# List the granted user-defined Python functions again to see if the revocation was successful.
oml.script.dir(sctype="grant")

Listing for This Example

>>> # List the user-defined Python functions available only to the current user.
oml.script.dir(sctype='user')
   name                                             script
0  MYLM  def build_lm1(dat):\n  from sklearn import lin...
>>>
>>># Grant the read privilege to the MYLM user-defined Python function to the user SH.
...oml.grant(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the user-defined Python functions to which read privilege has been granted.
... oml.script.dir(sctype="grant")
   name grantee
0  MYLM      SH
>>>
>>> # Revoke the read privilege to the MYLM user-defined Python function from the user SH.
... oml.revoke(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the granted user-defined Python functions again to see if the revocation was successful.
... oml.script.dir(sctype="grant")
Empty DataFrame
Columns: [name, grantee]
Index: []