1 About Oracle Machine Learning for Python

The following topics describe Oracle Machine Learning for Python (OML4Py) and its advantages for the Python user.

What Is Oracle Machine Learning for Python?

Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API.

OML4Py is a Python module that enables Python users to manipulate data in database tables and views using Python syntax. OML4Py functions and methods transparently translate a select set of Python functions into SQL for in-database execution.

OML4Py is available in the following Oracle database environments:

  • The Python interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database. For more information, see Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle Machine Learning Notebooks.

    In this environment, all the required components are included, including Python, required Python libraries, and the Python interpreter in Notebooks.

  • An OML4Py client connection to OML4Py in an on-premises Oracle Database instance.

    For this environment, you must install Python, the required Python libraries, and the OML4Py server components in the database, and you must install the OML4Py client. See Install OML4Py for On-Premises Databases.

Designed for problems involving both large and small volumes of data, OML4Py integrates Python with the database. With OML4Py, you can do the following:

  • Develop, refine, and deploy user-defined Python functions and machine learning models that leverage the parallelism and scalability of the database to automate data preparation and machine learning.

  • Run overloaded Python functions and use native Python syntax to manipulate in-database data, without having to learn SQL.

  • Use Automated Machine Learning (AutoML) to enhance user productivity and machine learning results through automated algorithm and feature selection, as well as model tuning and selection.

  • Use Embedded Python Execution to run user-defined Python functions in Python engines spawned and managed by the database environment. The user-defined functions and data are automatically loaded to the engines as required, and when data-parallel and task-parallel execution is enabled.

Advantages of Oracle Machine Learning for Python

Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user.

With OML4Py, you can do the following:

  • Operate on database data without using SQL

    OML4Py transparently translates many standard Python functions into SQL. With OML4Py, you can create Python proxy objects that access, analyze, and manipulate data that resides in the database. OML4Py can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.

    OML4Py overloaded functions are available for many commonly used Python functions, including those on Pandas data frames for in-database execution.

    See Also: Transparently Convert Python to SQL

  • Automate common machine learning tasks

    By using Oracle’s advanced Automated Machine Learning (AutoML) technology, both data scientists and beginner machine learning users can automate common machine learning modeling tasks such as algorithm selection and feature selection, and model tuning and selection, all of which leverage the parallel processing and scalability of the database.

    See Also: About Automated Machine Learning

  • Minimize data movement

    By keeping the data in the database whenever possible, you eliminate the time involved in transferring the data to your client Python engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.

    See Also: About Moving Data Between the Database and a Python Session

  • Keep data secure

    By keeping the data in the database, you have the security, scalability, reliability, and backup features of the database for managing the data.

  • Use the power of the database

    By operating directly on data in the database, you can use the memory and processing power of the database and avoid the memory constraints of your client Python engine.

  • Use current data

    As data is refreshed in the database, you have immediate access to current data.

  • Save Python objects to a datastore in the database

    You can save Python objects to an OML4Py datastore for future use and for use by others.

    See Also: About OML4Py Datastores

  • Build and store models in the database

    Using Embedded Python Execution, you can build native Python models and store and manage them in an OML4Py datastore.

    You can also build in-database models, with, for example, an oml class such as the Decision Tree class oml.dt. These in-database models have proxy objects that reference the actual models. Keeping with normal Python behavior, when the Python engine terminates, all in-memory objects, including models, are lost. To prevent an in-database model created using OML4Py from being deleted when the database connection is terminated, you must store its proxy object in a datastore.

    See Also: About Machine Learning Classes and Algorithms

  • Score data

    For most of the OML4Py machine learning classes, you can use the predict and predict_proba methods of the model object to score new data.

    For these OML4Py in-database models, you can also use the SQL PREDICTION function on the model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability.

  • Run user-defined Python functions in embedded Python engines

    Using OML4Py Embedded Python Execution, you can store user-defined Python functions in the OML4Py script repository, and run those functions in Python engines spawned by the database environment. When a user-defined Python function runs, the database starts, controls, and manages one or more Python engines that can run in parallel. With the Embedded Python Execution functionality, you can do the following:

    • Use a select set of Python packages in user-defined functions that run in embedded Python engines

    • Use other Python packages in user-defined Python functions that run in embedded Python engines

    • Operationalize user-defined Python functions for use in production applications and eliminate porting Python code and models into other languages; avoid reinventing code to integrate Python results into existing applications

    • Seamlessly leverage your Oracle database as a high-performance computing environment for user-defined Python functions, providing data parallelism and resource management

    • Perform parallel simulations, for example, Monte Carlo analysis, using the oml.index_apply function

    • Generate PNG images and XML representations of both structured and image data, which can be used by Python clients and SQL-based applications. PNG images and structured data can be used for Python clients and applications that use REST APIs.

    See Also: About Embedded Python Execution

Transparently Convert Python to SQL

With the transparency layer classes, you can convert select Python objects to Oracle database objects and also invoke a range of familiar Python functions that are overloaded to invoke the corresponding SQL on tables in the database.

The OML4Py transparency layer does the following:

  • Contains functions that convert Python pandas.DataFrame objects to database tables

  • Overloads Python functions, translating their functionality into SQL

  • Leverages proxy objects for database data

  • Uses familiar Python syntax to manipulate database data

The following table lists the transparency layer functions.

Table 1-1 Transparency Layer Functions

Function Description
oml.create

Creates a table in a the database schema from a Python data set.

oml_object.pull

Creates a local Python object that contains a copy of data referenced by the oml object.

oml.push

Pushes data from a Python session into an object in a database schema.

oml.sync

Creates a DataFrame proxy object in Python that represents a database table or view.

oml.dir

Return the names of oml objects in the Python session workspace.

oml.drop

Drops a persistent database table or view.

Transparency layer proxy classes map SQL data types or objects to corresponding Python types. The classes provide Python functions and operators that are the same as those on the mapped Python types. The following table lists the transparency layer data type classes.

Table 1-2 Transparency Layer Data Type Classes

Class Description
oml.Boolean

A boolean series data class that represents a single column of 0, 1, and NULL values in database data.

oml.Bytes

A binary series data class that represents a single column of RAW or BLOB database data types.

oml.Float

A numeric series data class that represents a single column of NUMBER, BINARY_DOUBLE, or BINARY_FLOAT database data types.

oml.String

A character series data class that represents a single column of VARCHAR2, CHAR, or CLOB database data types.

oml.DataFrame

A tabular DataFrame class that represents multiple columns of oml.Boolean, oml.Bytes, oml.Float, and oml.String data.

The following table lists the mappings of OML4Py data types for both the reading and writing of data between Python and the database.

Table 1-3 Python and SQL Data Type Equivalencies

Database Read Python Data Types Database Write

N/A

Boolean

If oranumber == True, then NUMBER (the default), else BINARY_DOUBLE.

BLOB

RAW

bytes

BLOB

RAW

BINARY_DOUBLE

BINARY_FLOAT

NUMBER

float

If oranumber == True, then NUMBER (the default), else BINARY_DOUBLE.

CHAR

CLOB

VARCHAR2

str

CHAR

CLOB

VARCHAR2

About the Python Components and Libraries in OML4Py

OML4Py requires an installation of Python, a number of Python libraries, as well as the OML4Py components.

  • In Oracle Autonomous Database, OML4Py is already installed. The OML4Py installation includes Python, additional required Python libraries, and the OML4Py server components. A Python interpreter is included with Oracle Machine Learning Notebooks in Autonomous Database.

  • You can install OML4Py in an on-premises Oracle Database. In this case, you must install Python, the additional required Python libraries, the OML4Py server components, and an OML4Py client. See Install OML4Py for On-Premises Databases.

Python Version in Current Release of OML4Py

The current release of OML4Py is based on Python 3.9.5.

This version is in the current release of Oracle Autonomous Database. You must install it manually when installing OML4Py on an on-premises Oracle Database.

Required Python Libraries

The following Python libraries must be included.

  • cx_Oracle 8.1.0
  • cycler 0.10.0
  • joblib 1.1.0
  • kiwisolver 1.1.0
  • matplotlib 3.3.3
  • numpy 1.21.5
  • pandas 1.3.4
  • Pillow-8.2.0
  • pyparsing 2.4.0
  • python-dateutil 2.8.1
  • pytz 2019.3
  • scikit-learn 1.0.1
  • scipy 1.7.3
  • six 1.13.0
  • threadpoolctl 2.1.0

All the above libraries are included with Python in the current release of Oracle Autonomous Database.

For an installation of OML4Py in an on-premises Oracle Database, you must install Python and additionally the libraries listed here. See Install OML4Py for On-Premises Databases.