10.6.2.4 pyqRowEval Function (Autonomous Database)

The function pyqRowEval when used in Oracle Autonomous Database, chunks data into sets of rows and then runs a user-defined Python function on each chunk.

The function pyqRowEval passes the data specified by the INP_NAM parameter to the user-defined Python function specified by the SCR_NAME parameter. The PAR_LST parameter specifies the special control argument oml_graphics_flag to capture images rendered in the script, and the oml_parallel_flag and oml_service_level flags enable parallelism using the MEDIUM service level. See also: Special Control Arguments (Autonomous Database).

The ROW_NUM parameter specifies the maximum number of rows to pass to each invocation of the user-defined Python function. The last set of rows may have fewer rows than the number specified.

The user-defined Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a pandas.DataFrame object. You can define the form of the returned value with the OUT_FMT parameter.

Syntax

FUNCTION PYQSYS.pyqRowEval(
    INP_NAM    VARCHAR2,
    PAR_LST    VARCHAR2,
    OUT_FMT    VARCHAR2,
    ROW_NUM    NUMBER,
    SCR_NAME   VARCHAR2,
    SCR_OWNER  VARCHAR2 DEFAULT NULL,
    ENV_NAME   VARCHAR2 DEFAULT NULL 
    )
    RETURN SYS.AnyDataSet

Parameters

Parameter Description

INP_NAM

The name of a table or view that specifies the data to pass to the Python function specified by the SCR_NAME parameter. If using a table or view owned by another user, use the format <owner name>.<table/view name>. You must have read access to the specified table or view.

PAR_LST

A JSON string that contains additional parameters to pass to the user-defined Python function specified by the SCR_NAME parameter. Special control arguments, which start with oml_, are not passed to the function specified by SCR_NAME, but instead control what happens before or after the invocation of the function.

For example, to specify the input data type as pandas.DataFrame, use:

'{"oml_input_type":"pandas.DataFrame"}'

See also: Special Control Arguments (Autonomous Database).

OUT_FMT

The format of the output returned by the function. It can be one of the following:

  • A JSON string that specifies the column names and data types of the table returned by the function. Any image data is discarded. The Python function must return a pandas.DataFrame, a numpy.ndarray, a tuple, or a list of tuples.
  • The string 'JSON', which specifies that the table returned contains a CLOB that is a JSON string.

  • The string 'XML', which specifies that the table returned contains a CLOB that is an XML string. The XML can contain both structured data and images, with structured or semi-structured Python objects first, followed by the image or images generated by the Python function.
  • The string 'PNG', which specifies that the table returned contains a BLOB that has the image or images generated by the Python function. Images are returned as a base 64 encoding of the PNG representation.

See also: Output Formats (Autonomous Database).

ROW_NUM

The number of rows in a chunk. The Python script is executed in each chunk.

SCR_NAME

The name of a user-defined Python function in the OML4Py script repository.

SCR_OWNER

The owner of the registered Python script. The default value is NULL. If NULL, will search for the Python script in the user’s script repository.

ENV_NAME

The name of the conda environment that should be used when running the named user-defined Python function.

Example

This example loads the Python model linregr to predict row chunks of sample iris data. The model is created and saved in the datastore pymodel, which is shown in the example for pyqTableEval Function (Autonomous Database).

The example defines a Python function and stores it in the OML4Py script repository. It uses the user-defined Python function to create a database table as the result of the pyqEval function. It defines a Python function that runs a prediction function on a model loaded from the OML4Py datastore. It then invokes the pyqTableEval function to invoke the function on chunks of rows from the database table.

In a PL/SQL block, define the function sample_iris_table and store it in the script repository. The function loads the iris data set, creates two pandas.DataFrame objects, and then returns a sample of the concatenation of those objects.

BEGIN
    sys.pyqScriptCreate('sample_iris_table',
        'def sample_iris_table(size):
            from sklearn.datasets import load_iris
            import pandas as pd
            iris = load_iris()
            x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
                            "Sepal_Width","Petal_Length","Petal_Width"])
            y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
                                    2: "virginica"}[x], iris.target)),\
                            columns = ["Species"])
            return pd.concat([y, x], axis=1).sample(int(size))',
        FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/

Create the SAMPLE_IRIS table in the database as the result of a SELECT statement, which invokes the pyqEval function on the sample_iris_table user-defined Python function saved in the script repository with the same name. The sample_iris_table function returns an iris data sample of size size.

CREATE TABLE sample_iris AS
SELECT *
    FROM TABLE(pyqEval(
                '{"size":20}',
                '{"Species":"varchar2(10)","Sepal_Length":"number",
                "Sepal_Width":"number","Petal_Length":"number",
                "Petal_Width":"number"}',
                'sample_iris_table'));

Define the Python function predict_model and store it with the name linregrPredict in the script repository. The function predicts the data in dat with the Python model specified by the modelName argument, which is loaded from the datastore specified by the datastoreName argument. The function also plots the actual petal width values with the predicted values. The predictions are finally concatenated and returned with dat as the object that the function returns.

BEGIN
    sys.pyqScriptCreate('linregrPredict',
        'def predict_model(dat, modelName, datastoreName):
            import oml
            import pandas as pd
            objs = oml.ds.load(name=datastoreName, to_globals=False)
            pred = objs[modelName].predict(dat[["Sepal_Length",\
            "Sepal_Width","Petal_Length"]])
            return pd.concat([dat, pd.DataFrame(pred, \
            columns=["Pred_Petal_Width"])], axis=1)',
        FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/

Run a SELECT statement that invokes the pyqRowEval function, which runs the specified Python function on each chunk of rows in the specified data set.

The INP_NAM argument specifies the data in the SAMPLE_IRIS table to pass to the Python function.

The PAR_LST argument specifies passing the input data as a pandas. DataFrame with the special control argument oml_input_type, along with values for the function arguments modelName and datastoreName.

In the OUT_FMT argument, the JSON string specifies the column names and data types of the structured table output.

The ROW_NUM argument specifies that five rows are included in each invocation of the function specified by SCR_NAME.

The SCR_NAME parameter specifies linregrPredict, which is the name in the script repository of the user-defined Python function to invoke.

SELECT *
    FROM table(pyqRowEval(
        inp_nam => 'SAMPLE_IRIS',
        par_lst => '{"oml_input_type":"pandas.DataFrame",
                     "modelName":"linregr", "datastoreName":"pymodel"}',
        out_fmt => '{"Species":"varchar2(12)", "Petal_Length":"number", "Pred_Petal_Width":"number"}',
        row_num => 5,
        scr_name => 'linregrPredict'));

The output is the following.

Species      Petal_Length   Pred_Petal_Width   
setosa                  1.2       0.0653133202 
versicolor              4.5        1.632087234 
setosa                  1.3       0.2420812759 
setosa                  1.9       0.5181904241 
setosa                  1.4       0.2162518989 
setosa                  1.4       0.1732424372 
setosa                  1.5       0.2510460971 
setosa                  1.3       0.1907951829 
versicolor              3.9       1.1999981051 
versicolor              4.2       1.4017887483 
versicolor                4       1.2332360562 
versicolor              4.8        1.765473067 
virginica               5.6       2.0095892178 
versicolor              4.7       1.5824801232 

Species      Petal_Length   Pred_Petal_Width   
virginica               5.4       2.0623088225 
versicolor              4.7       1.6524411804 
virginica               5.6       1.9919751044 
virginica               5.8       2.1206308288 
virginica               5.1       1.7983383572 
versicolor              4.4       1.3677441077 


20 rows selected. 

Run a SELECT statement that invokes the pyqRowEval function and return the XML output. Each invocation of script linregrPredict is applied to 10 rows of data in the SAMPLE_IRIS table. The XML output is a CLOB; you can call set long [length] to get more output.

set long 300
SELECT *
    FROM table(pyqRowEval(
        inp_nam => 'SAMPLE_IRIS',
        par_lst => '{"oml_input_type":"pandas.DataFrame",
                     "modelName":"linregr", "datastoreName":"pymodel", "oml_parallel_flag":true", "oml_service_level":"MEDIUM"}',
        out_fmt => 'XML',
        row_num => 10,
        scr_name => 'linregrPredict'));

The output is the following:

NAME VALUE
       <root><pandas_dataFrame><ROW-pandas_dataFrame><Species>setosa</Species><Sepal_Length>5</Sepal_Length><Sepal_Width>3.2</Sepal_Width><Petal_Length>1.2</Petal_Length><Petal_Width>0.2</Petal_Width><Pred_Petal_Width>0.0653133201897007</Pred_Petal_Width></ROW-pandas_dataFrame><ROW-pandas_dataFrame><Species>