Advantages of Oracle Machine Learning for Python
Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user.
With OML4Py, you can do the following:
-
Operate on database data without using SQL
OML4Py transparently translates many standard Python functions into SQL. With OML4Py, you can create Python proxy objects that access, analyze, and manipulate data that resides in the database. OML4Py can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.
OML4Py overloaded functions are available for many commonly used Python functions, including those on Pandas data frames for in-database execution.
See Also: Transparently Convert Python to SQL
-
Automate common machine learning tasks
By using Oracle’s advanced Automated Machine Learning (AutoML) technology, both data scientists and beginner machine learning users can automate common machine learning modeling tasks such as algorithm selection and feature selection, and model tuning and selection, all of which leverage the parallel processing and scalability of the database.
See Also: About Automated Machine Learning
-
Minimize data movement
By keeping data in the database whenever possible, you eliminate the time involved in transferring the data to your client Python engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
See Also: About Moving Data Between the Database and a Python Session
-
Keep data secure
By keeping the data in the database, you have the security, scalability, reliability, and backup features of the database for managing the data.
-
Use the power of the database
By operating directly on data in the database, you can use the memory and processing power of the database and avoid the memory constraints of your client Python engine.
-
Use current data
As data is refreshed in the database, you have immediate access to current data.
-
Save Python objects to a datastore in the database
You can save Python objects to an OML4Py datastore for future use and for use by others.
See Also: About OML4Py Datastores
-
Build and store native Python models in the database
Using Embedded Python Execution, you can build native Python models and store and manage them in an OML4Py datastore.
You can also build in-database models, with, for example, an
oml
class such as the Decision Tree classoml.dt
. These in-database models have proxy objects that reference the actual models. Keeping with normal Python behavior, when the Python engine terminates, all in-memory objects, including models, are lost. To prevent an in-database model created using OML4Py from being deleted when the database connection is terminated, you must store its proxy object in a datastore. -
Score data
For most of the OML4Py machine learning classes, you can use the
predict
andpredict_proba
methods of the model object to score new data.For these OML4Py in-database models, you can also use the SQL
PREDICTION
function on the model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability. -
Run user-defined Python functions in embedded Python engines
Using OML4Py Embedded Python Execution, you can store user-defined Python functions in the OML4Py script repository, and run those functions in Python engines spawned by the database environment. When a user-defined Python function runs, the database starts, controls, and manages one or more Python engines that can run in parallel. With the Embedded Python Execution functionality, you can do the following:
-
Use a select set of Python packages in user-defined functions that run in embedded Python engines
-
Use other Python packages and third-party package in user-defined Python functions that run in embedded Python engines
-
Operationalize user-defined Python functions for use in production applications and eliminate porting Python code and models into SQL, and on ADB, REST; avoid reinventing code to integrate Python results into existing applications
-
Seamlessly leverage your Oracle database as a high-performance computing environment for user-defined Python functions, providing data parallelism and resource management
-
Perform parallel simulations, for example, Monte Carlo analysis, using the
oml.index_apply
function -
Generate JSON images, PNG images and XML representations of both structured and image data, which can be used by Python clients and SQL-based applications. PNG images and structured data can be used for Python clients and applications that use REST APIs.
See Also: About Embedded Python Execution
-
Parent topic: About Oracle Machine Learning for Python