1 About Oracle Machine Learning for Python
The following topics describe Oracle Machine Learning for Python (OML4Py) and its advantages for the Python user.
- What Is Oracle Machine Learning for Python?
Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API. The OML4Py supports running user-defined Python functions through the database spawned and controlled Python engines, with optional built-in data-parallelism and task-parallelism. This embedded execution functionality enables invoking user-defined functions from SQL, and on ADB, REST. The OML4Py supports Automated Machine Learning (AutoML) for algorithm and feature selection, and model tuning and selection. You can augment the Python included functionality with third-party packages from the Python ecosystem. - Advantages of Oracle Machine Learning for Python
Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user. - Manipulate database tables and views using familiar Python functions and syntax
With the transparency layer classes, you can manipulate database tables and views using familiar Python functions and syntax, For example, using DataFrame proxy objects that map to database data, users can invoke overloaded Pandas functions that transparently generate SQL that runs in the database, using the database as a high-performance compute engine. - About the Python Components and Libraries in OML4Py
OML4Py requires an installation of Python, the specified Python libraries, as well as the OML4Py components.
1.1 What Is Oracle Machine Learning for Python?
Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for data transformations and for statistical, machine learning, and graphical analysis on data stored in or accessible through an Oracle database using a Python API. The OML4Py supports running user-defined Python functions through the database spawned and controlled Python engines, with optional built-in data-parallelism and task-parallelism. This embedded execution functionality enables invoking user-defined functions from SQL, and on ADB, REST. The OML4Py supports Automated Machine Learning (AutoML) for algorithm and feature selection, and model tuning and selection. You can augment the Python included functionality with third-party packages from the Python ecosystem.
OML4Py is a Python module that enables Python users to manipulate data in database tables and views using Python syntax. OML4Py functions and methods transparently translate a select set of Python functions into SQL for in-database execution.
OML4Py is available in the Python interpreter in Oracle Machine Learning Notebooks in your Oracle Autonomous Database. For more information, see Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle Machine Learning Notebooks.
Designed for problems involving both large and small volumes of data, OML4Py integrates Python with the database. With OML4Py, you can do the following:
-
Run overloaded Python functions and use native Python syntax to manipulate in-database data, without having to learn SQL.
-
Use Automated Machine Learning (AutoML) to enhance user productivity and machine learning results through automated algorithm and feature selection, as well as model tuning and selection.
-
Use Embedded Python Execution to run user-defined Python functions in Python engines spawned and managed by the database environment. The user-defined functions and data are automatically loaded to the engines as required, and when data-parallel and task-parallel execution is enabled. Develop, refine, and deploy user-defined Python functions and machine learning models that leverage the parallelism and scalability of the database to automate data preparation and machine learning.
- Use a natural Python interface to build in-database machine learning models.
Parent topic: About Oracle Machine Learning for Python
1.2 Advantages of Oracle Machine Learning for Python
Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user.
With OML4Py, you can do the following:
-
Operate on database data without using SQL
OML4Py transparently translates many standard Python functions into SQL. With OML4Py, you can create Python proxy objects that access, analyze, and manipulate data that resides in the database. OML4Py can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.
OML4Py overloaded functions are available for many commonly used Python functions, including those on Pandas data frames for in-database execution.
See Also: Manipulate database tables and views using familiar Python functions and syntax
-
Automate common machine learning tasks
By using Oracle’s advanced Automated Machine Learning (AutoML) technology, both data scientists and beginner machine learning users can automate common machine learning modeling tasks such as algorithm selection and feature selection, and model tuning and selection, all of which leverage the parallel processing and scalability of the database.
See Also: About Automated Machine Learning
-
Minimize data movement
By keeping data in the database whenever possible, you eliminate the time involved in transferring the data to your client Python engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
See Also: About Moving Data Between the Database and a Python Session
-
Keep data secure
By keeping the data in the database, you have the security, scalability, reliability, and backup features of the database for managing the data.
-
Use the power of the database
By operating directly on data in the database, you can use the memory and processing power of the database and avoid the memory constraints of your client Python engine.
-
Use current data
As data is refreshed in the database, you have immediate access to current data.
-
Save Python objects to a datastore in the database
You can save Python objects to an OML4Py datastore for future use and for use by others.
See Also: About OML4Py Datastores
-
Build and store native Python models in the database
Using Embedded Python Execution, you can build native Python models and store and manage them in an OML4Py datastore.
You can also build in-database models, with, for example, an
oml
class such as the Decision Tree classoml.dt
. These in-database models have proxy objects that reference the actual models. Keeping with normal Python behavior, when the Python engine terminates, all in-memory objects, including models, are lost. To prevent an in-database model created using OML4Py from being deleted when the database connection is terminated, you must store its proxy object in a datastore. -
Score data
For most of the OML4Py machine learning classes, you can use the
predict
andpredict_proba
methods of the model object to score new data.For these OML4Py in-database models, you can also use the SQL
PREDICTION
function on the model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability. -
Run user-defined Python functions in embedded Python engines
Using OML4Py Embedded Python Execution, you can store user-defined Python functions in the OML4Py script repository, and run those functions in Python engines spawned by the database environment. When a user-defined Python function runs, the database starts, controls, and manages one or more Python engines that can run in parallel. With the Embedded Python Execution functionality, you can do the following:
-
Use a select set of Python packages in user-defined functions that run in embedded Python engines
-
Use other Python packages and third-party package in user-defined Python functions that run in embedded Python engines
-
Operationalize user-defined Python functions for use in production applications and eliminate porting Python code and models into SQL, and on ADB, REST; avoid reinventing code to integrate Python results into existing applications
-
Seamlessly leverage your Oracle database as a high-performance computing environment for user-defined Python functions, providing data parallelism and resource management
-
Perform parallel simulations, for example, Monte Carlo analysis, using the
oml.index_apply
function -
Generate JSON images, PNG images and XML representations of both structured and image data, which can be used by Python clients and SQL-based applications. PNG images and structured data can be used for Python clients and applications that use REST APIs.
See Also: About Embedded Python Execution
-
Parent topic: About Oracle Machine Learning for Python
1.3 Manipulate database tables and views using familiar Python functions and syntax
With the transparency layer classes, you can manipulate database tables and views using familiar Python functions and syntax, For example, using DataFrame proxy objects that map to database data, users can invoke overloaded Pandas functions that transparently generate SQL that runs in the database, using the database as a high-performance compute engine.
The OML4Py transparency layer does the following:
-
Enables creating tables and views from
pandas.DataFrame
and getting proxy objects to tables and views. -
Overloads specific Python functions that transparently translate functionality to SQL
-
Leverages proxy objects for database data
-
Uses familiar Python syntax to manipulate database data
The following table lists the transparency layer functions for getting and creating proxy objects and tables/views.
Table 1-1 Transparency Layer Functions for getting and creating proxy objects and tables/views
Function | Description |
---|---|
oml.create |
Creates a table in a the database schema from a Python data set. |
oml_object.pull |
Creates a local Python object that contains a copy of data fetched from database object referenced by the |
oml.push |
Pushes data from a Python session into an object in a database schema. |
oml.sync |
Creates a |
oml.dir |
Return the names of |
oml.drop |
Drops a persistent database table or view. |
Transparency layer proxy classes map SQL data types or objects to corresponding Python types. The classes provide Python functions and operators that are the same as those on the mapped Python types. The following table lists the transparency layer data type classes.
Table 1-2 Transparency Layer Data Type Classes
Class | Description |
---|---|
oml.Boolean |
A boolean series data class that represents a single column of 0, 1, and NULL values in database data. |
oml.Bytes |
A binary series data class that represents a single column of |
oml.Float |
A numeric series data class that represents a single column of |
oml.String |
A character series data class that represents a single column of |
oml.DataFrame |
A tabular |
oml.Integer |
A data class that represents a single column of NUMBER(*,0) data in the database.
|
oml.Datetime |
A series date class that represents a single column of |
oml.Timezone |
A time class that is used with oml.Datetime to support TIME STAMP WITH TIME ZONE .
|
oml.Timedelta |
A time class that represents a single column series of differences between two dates or times, or INTERVAL DAY TO SECOND in Oracle Database.
|
The following table lists the mappings of Python data types for both the reading and writing of data between Python and the database.
Table 1-3 Python and SQL Data Type Equivalencies
Database Read | Python Data Types | Database Write |
---|---|---|
N/A |
Bool |
If |
|
bytes |
|
|
float |
If |
|
str |
|
NUMBER(*,0) |
int |
NUMBER(*,0) |
TIMESTAMP or TIMESTAMP WITH TIME ZONE |
datetime.datetime |
TIMESTAMP or TIMESTAMP WITH TIME ZONE |
TIMESTAMP WITH TIME ZONE |
datetime.timezone |
TIMESTAMP WITH TIME ZONE |
INTERVAL DAY TO SECOND |
datetime.timedelta |
INTERVAL DAY TO SECOND |
Parent topic: About Oracle Machine Learning for Python
1.4 About the Python Components and Libraries in OML4Py
OML4Py requires an installation of Python, the specified Python libraries, as well as the OML4Py components.
-
In Oracle Autonomous Database, OML4Py is already installed. The OML4Py installation includes Python, additional required Python libraries, and the OML4Py server components. A Python interpreter is included with Oracle Machine Learning Notebooks in Autonomous Database.
- You can install third-party Python libraries in a conda environment through a conda interpreter for use within OML Notebooks sessions and OML4Py embedded execution invocations.
Python Version in Current Release of OML4Py
The current release of OML4Py is based on Python 3.12.0.
This version is in the current release of Oracle Autonomous Database.
Required Python Libraries
The following Python libraries must be included.
oracledb 1.4.0
cycler 0.10.0
joblib 1.1.0
kiwisolver 1.1.0
matplotlib 3.7.2
numpy 1.26.0
pandas 2.1.1
Pillow-8.2.0
pyparsing 2.4.0
python-dateutil 2.8.1
pytz 2022.1
scikit-learn 1.2.1
scipy 1.11.3
six 1.13.0
threadpoolctl 3.1.0
All the above libraries are included with Python in the current release of Oracle Autonomous Database.
Parent topic: About Oracle Machine Learning for Python