1 About Oracle Machine Learning for Python

The following topics describe Oracle Machine Learning for Python (OML4Py) and its advantages for the Python user.

1.1 What Is Oracle Machine Learning for Python?

Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API. The OML4Py supports running user-defined Python functions through the database spawned and controlled Python engines, with optional built-in data-parallelism and task-parallelism. This embedded execution functionality enables invoking user-defined functions from SQL, and on ADB, REST. The OML4Py supports Automated Machine Learning (AutoML) for algorithm and feature selection, and model tuning and selection. You can augment the Python included functionality with third-party packages from the Python ecosystem.

OML4Py is a Python module that enables Python users to manipulate data in database tables and views using Python syntax. OML4Py functions and methods transparently translate a select set of Python functions into SQL for in-database execution.

OML4Py is available in the Python interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database. For more information, see Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle Machine Learning Notebooks.

Designed for problems involving both large and small volumes of data, OML4Py integrates Python with the database. With OML4Py, you can do the following:

  • Run overloaded Python functions and use native Python syntax to manipulate in-database data, without having to learn SQL.

  • Use Automated Machine Learning (AutoML) to enhance user productivity and machine learning results through automated algorithm and feature selection, as well as model tuning and selection.

  • Use Embedded Python Execution to run user-defined Python functions in Python engines spawned and managed by the database environment. The user-defined functions and data are automatically loaded to the engines as required, and when data-parallel and task-parallel execution is enabled. Develop, refine, and deploy user-defined Python functions and machine learning models that leverage the parallelism and scalability of the database to automate data preparation and machine learning.

  • Use a natural Python interface to build in-database machine learning models.

1.2 Advantages of Oracle Machine Learning for Python

Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user.

With OML4Py, you can do the following:

  • Operate on database data without using SQL

    OML4Py transparently translates many standard Python functions into SQL. With OML4Py, you can create Python proxy objects that access, analyze, and manipulate data that resides in the database. OML4Py can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.

    OML4Py overloaded functions are available for many commonly used Python functions, including those on Pandas data frames for in-database execution.

    See Also: Manipulate database tables and views using familiar Python functions and syntax

  • Automate common machine learning tasks

    By using Oracle’s advanced Automated Machine Learning (AutoML) technology, both data scientists and beginner machine learning users can automate common machine learning modeling tasks such as algorithm selection and feature selection, and model tuning and selection, all of which leverage the parallel processing and scalability of the database.

    See Also: About Automated Machine Learning

  • Minimize data movement

    By keeping data in the database whenever possible, you eliminate the time involved in transferring the data to your client Python engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.

    See Also: About Moving Data Between the Database and a Python Session

  • Keep data secure

    By keeping the data in the database, you have the security, scalability, reliability, and backup features of the database for managing the data.

  • Use the power of the database

    By operating directly on data in the database, you can use the memory and processing power of the database and avoid the memory constraints of your client Python engine.

  • Use current data

    As data is refreshed in the database, you have immediate access to current data.

  • Save Python objects to a datastore in the database

    You can save Python objects to an OML4Py datastore for future use and for use by others.

    See Also: About OML4Py Datastores

  • Build and store native Python models in the database

    Using Embedded Python Execution, you can build native Python models and store and manage them in an OML4Py datastore.

    You can also build in-database models, with, for example, an oml class such as the Decision Tree class oml.dt. These in-database models have proxy objects that reference the actual models. Keeping with normal Python behavior, when the Python engine terminates, all in-memory objects, including models, are lost. To prevent an in-database model created using OML4Py from being deleted when the database connection is terminated, you must store its proxy object in a datastore.

    See Also: About Machine Learning Classes and Algorithms

  • Score data

    For most of the OML4Py machine learning classes, you can use the predict and predict_proba methods of the model object to score new data.

    For these OML4Py in-database models, you can also use the SQL PREDICTION function on the model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability.

  • Run user-defined Python functions in embedded Python engines

    Using OML4Py Embedded Python Execution, you can store user-defined Python functions in the OML4Py script repository, and run those functions in Python engines spawned by the database environment. When a user-defined Python function runs, the database starts, controls, and manages one or more Python engines that can run in parallel. With the Embedded Python Execution functionality, you can do the following:

    • Use a select set of Python packages in user-defined functions that run in embedded Python engines

    • Use other Python packages and third-party package in user-defined Python functions that run in embedded Python engines

    • Operationalize user-defined Python functions for use in production applications and eliminate porting Python code and models into SQL, and on ADB, REST; avoid reinventing code to integrate Python results into existing applications

    • Seamlessly leverage your Oracle database as a high-performance computing environment for user-defined Python functions, providing data parallelism and resource management

    • Perform parallel simulations, for example, Monte Carlo analysis, using the oml.index_apply function

    • Generate JSON images, PNG images and XML representations of both structured and image data, which can be used by Python clients and SQL-based applications. PNG images and structured data can be used for Python clients and applications that use REST APIs.

    See Also: About Embedded Python Execution

1.3 Manipulate database tables and views using familiar Python functions and syntax

With the transparency layer classes, you can manipulate database tables and views using familiar Python functions and syntax, For example, using DataFrame proxy objects that map to database data, users can invoke overloaded Pandas functions that transparently generate SQL that runs in the database, using the database as a high-performance compute engine.

The OML4Py transparency layer does the following:

  • Enables creating tables and views from pandas.DataFrame and getting proxy objects to tables and views.

  • Overloads specific Python functions that transparently translate functionality to SQL

  • Leverages proxy objects for database data

  • Uses familiar Python syntax to manipulate database data

The following table lists the transparency layer functions for getting and creating proxy objects and tables/views.

Table 1-1 Transparency Layer Functions for getting and creating proxy objects and tables/views

Function Description
oml.create

Creates a table in a the database schema from a Python data set.

oml_object.pull

Creates a local Python object that contains a copy of data fetched from database object referenced by the oml object.

oml.push

Pushes data from a Python session into an object in a database schema.

oml.sync

Creates a DataFrame proxy object in Python that represents a database table or view.

oml.dir

Return the names of oml objects in the Python session workspace.

oml.drop

Drops a persistent database table or view.

Transparency layer proxy classes map SQL data types or objects to corresponding Python types. The classes provide Python functions and operators that are the same as those on the mapped Python types. The following table lists the transparency layer data type classes.

Table 1-2 Transparency Layer Data Type Classes

Class Description
oml.Boolean

A boolean series data class that represents a single column of 0, 1, and NULL values in database data.

oml.Bytes

A binary series data class that represents a single column of RAW or BLOB database data types.

oml.Float

A numeric series data class that represents a single column of NUMBER, BINARY_DOUBLE, or BINARY_FLOAT database data types.

oml.String

A character series data class that represents a single column of VARCHAR2, CHAR, or CLOB database data types.

oml.DataFrame

A tabular DataFrame class that represents multiple columns of oml.Boolean, oml.Bytes, oml.Float, and oml.String data.

oml.Integer A data class that represents a single column of NUMBER(*,0) data in the database.
oml.Datetime

A series date class that represents a single column of TIMESTAMP or TIMESTAMP WITH TIME ZONE in Oracle Database. oml.Timezone A time class that is used with oml.Datetime to support TIME STAMP WITH TIME ZONE. oml.Timedelta A time class that represents a single column series of differences between two dates or times, or INTERVAL DAY TO SECOND in Oracle Database.

oml.Timezone A time class that is used with oml.Datetime to support TIME STAMP WITH TIME ZONE.
oml.Timedelta A time class that represents a single column series of differences between two dates or times, or INTERVAL DAY TO SECOND in Oracle Database.

The following table lists the mappings of Python data types for both the reading and writing of data between Python and the database.

Table 1-3 Python and SQL Data Type Equivalencies

Database Read Python Data Types Database Write

N/A

Bool

If oranumber == True, then NUMBER (the default), else BINARY_DOUBLE.

BLOB

RAW

bytes

BLOB

RAW

BINARY_DOUBLE

BINARY_FLOAT

NUMBER

float

If oranumber == True, then NUMBER (the default), else BINARY_DOUBLE.

CHAR

CLOB

VARCHAR2

str

CHAR

CLOB

VARCHAR2

NUMBER(*,0) int NUMBER(*,0)
TIMESTAMP or TIMESTAMP WITH TIME ZONE datetime.datetime TIMESTAMP or TIMESTAMP WITH TIME ZONE
TIMESTAMP WITH TIME ZONE datetime.timezone TIMESTAMP WITH TIME ZONE
INTERVAL DAY TO SECOND datetime.timedelta INTERVAL DAY TO SECOND

1.4 About the Python Components and Libraries in OML4Py

OML4Py requires an installation of Python, the specified Python libraries, as well as the OML4Py components.

  • In Oracle Autonomous Database, OML4Py is already installed. The OML4Py installation includes Python, additional required Python libraries, and the OML4Py server components. A Python interpreter is included with Oracle Machine Learning Notebooks in Autonomous Database.

  • You can install third-party Python libraries in a conda environment through a conda interpreter for use within OML Notebooks sessions and OML4Py embedded execution invocations.

Python Version in Current Release of OML4Py

The current release of OML4Py is based on Python 3.12.0.

This version is in the current release of Oracle Autonomous Database.

Required Python Libraries

The following Python libraries must be included.

  • oracledb 1.4.0
  • cycler 0.10.0
  • joblib 1.1.0
  • kiwisolver 1.1.0
  • matplotlib 3.7.2
  • numpy 1.26.0
  • pandas 2.1.1
  • Pillow-8.2.0
  • pyparsing 2.4.0
  • python-dateutil 2.8.1
  • pytz 2022.1
  • scikit-learn 1.2.1
  • scipy 1.11.3
  • six 1.13.0
  • threadpoolctl 3.1.0

All the above libraries are included with Python in the current release of Oracle Autonomous Database.