10.3.1 About Embedded Python Execution

You may choose to run your functions in a data-parallel or task-parallel manner in one or more of these Python engines. In data-parallel processing, you partition the data and invoke the same user-defined Python function of each data subset using one or more Python engines. In task-parallel processing, you invoke a user-defined function multiple times in one or more Python engines with a unique index passed in as an argument; for example, you may use task parallelism for Monte Carlo simulations in which you use the index to set a random seed.

The following table lists the Python functions for Embedded Python Execution.

Function Description
oml.do_eval

Runs a user-defined Python function in a Python engine spawned and managed by the database environment.

oml.group_apply

Partitions a database table by the values in one or more columns and runs the provided user-defined Python function on each partition.

oml.index_apply

Runs a Python function multiple times, passing in a unique index of the invocation to the user-defined function.

oml.row_apply

Partitions a database table into sets of rows and runs the provided user-defined Python function on the data in each set.

oml.table_apply

Runs a Python function on data in the database as a single pandas.DataFrame in a single Python engine.

About Special Control Arguments

Special control arguments control what happens before or after the running of the function that you pass to an Embedded Python Execution function. You specify a special control argument with the **kwargs parameter of a function such as oml.do_eval. The control arguments are not passed to the function specified by the func argument of that function.

Table 10-1 Special Control Arguments

Argument Description
oml_input_type

Identifies the type of input data object that you are supplying to the func argument.

The input types are the following:

  • pandas.DataFrame
  • numpy.recarray
  • 'default' (the default value)

If all columns are numeric, then default type is a 2-dimensional numpy.ndarray of type numpy.float64. Otherwise, the default type is a pandas.DataFrame.

oml_na_omit

Controls the handling of missing values in the input data. If you specify oml_na_omit = True, then rows that contain missing values are removed from the input data. If all of the rows contain missing values, then the input data is an empty oml.DataFrame. The default value is False.

About Output

When a user-defined Python function runs in OML4Py, by default it returns the Python objects returned by the function. Also, OML4Py captures all matplotlib.figure.Figure objects created by the user-defined Python function and converts them into PNG format.

If graphics = True, the Embedded Python Execution functions return oml.embed.data_image._DataImage objects. The oml.embed.data_image._DataImage class contains Python objects and PNG images. Calling the method __repr__() displays the PNG images and prints out the Python object. By default, .dat returns the Python object that the user-defined Python function returned; .img returns a list containing PNG image data for each figure.

About the Script Repository

Embedded Python Execution includes the ability to create and store user-defined Python functions in the OML4Py script repository, grant or revoke the read privilege to a user-defined Python function, list the available user-defined Python functions, load user-defined Python functions into the Python environment, or drop a user-defined Python function from the script repository.

Along with whatever other actions a user-defined Python function performs, it can also create, retrieve, and modify Python objects that are stored in OML4Py datastores.

In Embedded Python Execution, a user-defined Python function runs in one or more Python engines spawned and managed by the database environment. The engines are dynamically started and managed by the database. From the same user-defined Python function you can get structured data and PNG images.

You can make the user-defined Python function either private or global. A global function is available to any user. A private function is available only to the owner or to users to whom the owner of the function has granted the read privilege.