Run a User-Defined Python Function on the Specified Data

10.4.3 Run a User-Defined Python Function on the Specified Data

Use the oml.table_apply function to run a Python function on data that you specify with the data parameter.

The oml.table_apply function runs a user-defined Python function in a Python engine spawned and managed by the database environment. With the func parameter, you can supply a Python function or you can specify the name of a user-defined Python function in the OML4Py script repository.

The syntax of the function is the following:

oml.table_apply(data, func, func_owner=None, graphics=False, **kwargs)

The data argument is an oml.DataFrame that contains the data that the func function operates on.

The func argument is the function to run. It may be one of the following:

A Python function
A string that is the name of a user-defined Python function in the OML4Py script repository
A string that defines a Python function
An oml.script.script.Callable object returned by the oml.script.load function

The optional func_owner argument is a string or None (the default) that specifies the owner of the registered user-defined Python function when argument func is a registered user-defined Python function name.

The graphics argument is a boolean that specifies whether to look for images. The default value is False.

With the **kwargs parameter, you can pass additional arguments to the func function. Special control arguments, which start with oml_, are not passed to the function specified by func, but instead control what happens before or after the execution of the function.

The oml.table_apply function returns a Python object or an oml.embed.data_image._DataImage. If no image is rendered in the user-defined Python function, oml.table_apply returns whatever Python object is returned by the function. Otherwise, it returns an oml.embed.data_image._DataImage object.

See Also: About Output

Example 10-7 Using the oml.table_apply Function

This example builds a regression model using in-memory data, and then uses the oml.table_apply function to predict using the model on the first 10 rows of the IRIS table.

import oml
import pandas as pd
from sklearn import datasets 
from sklearn import linear_model

# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()

x = pd.DataFrame(iris.data, 
                 columns = ['Sepal_Length','Sepal_Width',
                            'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x: 
                           {0: 'setosa', 1: 'versicolor', 
                            2:'virginica'}[x], iris.target)), 
                 columns = ['Species'])

# Drop the IRIS database table if it exists.
try:
    oml.drop('IRIS')
except: 
    pass

# Create the IRIS database table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

# Build a regression model using in-memory data.
iris = oml_iris.pull()
regr = linear_model.LinearRegression()
regr.fit(iris[['Sepal_Width', 'Petal_Length', 'Petal_Width']], 
         iris[['Sepal_Length']])
regr.coef_

# Use oml.table_apply to predict using the model on the first 10 
# rows of the IRIS table.
def predict(dat, regr):
    import pandas as pd
    pred = regr.predict(dat[['Sepal_Width', 'Petal_Length', 
                             'Petal_Width']])
    return pd.concat([dat,pd.DataFrame(pred)], axis=1)

res = oml.table_apply(data=oml_iris.head(n=10), 
                      func=predict, regr=regr)
res

Listing for This Example

>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets 
>>> from sklearn import linear_model
>>> 
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>>
>>> x = pd.DataFrame(iris.data, 
...                  columns = ['Sepal_Length','Sepal_Width',
...                             'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x: 
...                            {0: 'setosa', 1: 'versicolor', 
...                             2:'virginica'}[x], iris.target)), 
...                  columns = ['Species'])
>>>
>>> # Drop the IRIS database table if it exists.
... try:
...     oml.drop('IRIS')
... except: 
...     pass
>>>
>>> # Create the IRIS database table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Build a regression model using in-memory data.
... iris = oml_iris.pull()
>>> regr = linear_model.LinearRegression()
>>> regr.fit(iris[['Sepal_Width', 'Petal_Length', 'Petal_Width']], 
...          iris[['Sepal_Length']])
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, 
         normalize=False)
>>> regr.coef_
array([[ 0.65083716,  0.70913196, -0.55648266]])
>>>
>>> # Use oml.table_apply to predict using the model on the first 10
... # rows of the IRIS table.
... def predict(dat, regr):
... import pandas as pd
... pred = regr.predict(dat[['Sepal_Width', 'Petal_Length', 
...                          'Petal_Width']])
... return pd.concat([dat,pd.DataFrame(pred)], axis=1)
...
>>> res = oml.table_apply(data=oml_iris.head(n=10), 
...                       func=predict, regr=regr)
>>> res   Sepal_Length  Sepal_Width  Petal_Length  Petal_Width
0           4.6          3.6             1          0.2
1           5.1          2.5             3          1.1
2           6.0          2.2             4          1.0
3           5.8          2.6             4          1.2
4           5.5          2.3             4          1.3
5           5.5          2.5             4          1.3
6           6.1          2.8             4          1.3
7           5.7          2.5             5          2.0
8           6.0          2.2             5          1.5
9           6.3          2.5             5          1.9

      Species         0
0      setosa  4.796847
1  versicolor  4.998355
2  versicolor  5.567884
3  versicolor  5.716923
4  versicolor  5.466023
5  versicolor  5.596191
6   virginica  5.791442
7   virginica  5.915785
8   virginica  5.998775
9   virginica  5.971433

Parent topic: Python API for Embedded Python Execution