10.5.5 pyqGroupEval Function (On-Premises Database)

This topic describes the pyqGroupEval function when used in an on-premises Oracle Database. The pyqGroupEval function groups data by one or more columns and runs a user-defined Python function on each group.

The pyqGroupEval function runs the user-defined Python function specified by the EXP_NAM parameter. Pass data to the Python function with the INP_NAM parameter. Pass arguments to the Python function with the PAR_QRY parameter. Specify one or more grouping columns with the GRP_COL parameter.

The Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a pandas.DataFrame object. Define the form of the returned value with the OUT_QRY parameter.

Syntax

pyqGroupEval (
    INP_NAM     VARCHAR2       IN
    PAR_QRY     VARCHAR2       IN
    OUT_QRY     VARCHAR2       IN
    GRP_COL     VARCHAR2       IN
    EXP_NAM     VARCHAR2       IN)

Parameters

Parameter Description

INP_NAM

The name of a table or view that specifies the data to pass to the Python function specified by the EXP_NAM parameter. If using a table or view owned by another user, use the format <owner name>.<table/view name>. You must have read access to the specified table or view.

PAR_QRY

A JSON string that contains additional parameters to pass to the user-defined Python function specified by the EXP_NAM parameter. Special control arguments, which start with oml_, are not passed to the function specified by EXP_NAM, but instead control what happens before or after the invocation of the function.

For example, to specify the input data type as pandas.DataFrame, use:

'{"oml_input_type":"pandas.DataFrame"}'

OUT_QRY

The format of the output returned by the function. It can be one of the following:

  • A JSON string that specifies the column names and data types of the table returned by the function. Any image data is discarded.
  • The name of a table or view to use as a prototype. If using a table or view owned by another user, use the format <owner name>.<table/view name>. You must have read access to the specified table or view.
  • The string 'XML', which specifies that the table returned contains a CLOB that is an XML string. The XML can contain both structured data and images, with structured or semi-structured Python objects first, followed by the image or images generated by the Python function.
  • The string 'PNG', which specifies that the table returned contains a BLOB that has the image or images generated by the Python function. Images are returned as a base 64 encoding of the PNG representation.
GRP_COL

The names of the grouping columns by which to partition the data. Use commas to separate multiple columns. For example, to group by GENDER and YEAR:

"GENDER,YEAR"

EXP_NAM

The name of a user-defined Python function in the OML4Py script repository.

Returns

Function pyqGroupEval returns a table that has the structure specified by the OUT_QRY parameter value.

Example 10-18 Using the pyqGroupEval Function

This example defines the Python function create_iris_table and stores it with the name create_iris_table in the OML4Py script repository. It then invokes pyqEval, which invokes the user-definded Python function and creates the IRIS database table. The example creates the package irisPkg and uses that package in specifying the data cursor to pass to the irisGroupEval function, which is a user-defined pyqGroupEval function. It defines another Python function, group_count and stores it in the script repository with the name mygroupcount. The example then invokes the irisGroupEval function and passes it the Python function saved with the name mygroupcount.

In a PL/SQL block, define the Python function create_iris_table and store in the script repository with the name create_iris_table.

BEGIN
  sys.pyqScriptCreate('create_iris_table',
    'def create_iris_table():
       from sklearn.datasets import load_iris
       import pandas as pd
       iris = load_iris()
       x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
                        "Sepal_Width","Petal_Length","Petal_Width"])
       y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
                                 2: "virginica"}[x], iris.target)),\
                        columns = ["Species"])
       return pd.concat([y, x], axis=1)');
END;
/

Invoke the pyqEval function to create the database table IRIS, using the Python function stored with the name create_iris_table in the script repository.

CREATE TABLE IRIS AS
(SELECT * FROM pyqEval(
      NULL,
      '{"Species":"VARCHAR2(10)","Sepal_Length":"number",
        "Sepal_Width":"number","Petal_Length":"number",
        "Petal_Width":"number"}',
      'create_iris_table'
));

Define the Python function group_count and store it with the name mygroupcount in the script repository. The function returns a pandas.DataFrame generated on each group of data dat.

BEGIN
  sys.pyqScriptCreate('mygroupcount',
    'def group_count(dat):
    import pandas as pd
    return pd.DataFrame([(dat["Species"][0], dat.shape[0])],\
    columns = ["Species", "CNT"]) ');
END;
/

Issue a query that invokes the pyqGroupEval function. In the function, the INP_NAM argument specifies the data in the IRIS table to pass to the function.

The PAR_QRY argument specifies the special control argument oml_input_type.

The OUT_QRY argument specifies a JSON string that contains the column names and data types of the table returned by pyqGroupEval.

The GRP_COL parameter specifies the column to group by.

The EXP_NAM parameter specifies the user-defined Python function stored with the name mygroupcount in the script repository.

SELECT *
   FROM table(
    pyqGroupEval(
      'IRIS',
      '{"oml_input_type":"pandas.DataFrame"}',
      '{"Species":"varchar2(10)", "CNT":"number"}',     
      'Species',
      'mygroupcount'));

The output is the following.

Species           CNT
---------- ----------
setosa             50
versicolor         50
virginica          50