Advantages of Oracle Machine Learning for Python

Using OML4Py to prepare and analyze data in or accessible to an Oracle database has many advantages for a Python user.

With OML4Py, you can do the following:

  • Operate on database data without using SQL

    OML4Py transparently translates many standard Python functions into SQL. With OML4Py, you can create Python proxy objects that access, analyze, and manipulate data that resides in the database. OML4Py can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.

    OML4Py overloaded functions are available for many commonly used Python functions, including those on Pandas data frames for in-database execution.

    See Also: Transparently Convert Python to SQL

  • Automate common machine learning tasks

    By using Oracle’s advanced Automated Machine Learning (AutoML) technology, both data scientists and beginner machine learning users can automate common machine learning modeling tasks such as algorithm selection and feature selection, and model tuning and selection, all of which leverage the parallel processing and scalability of the database.

    See Also: About Automated Machine Learning

  • Minimize data movement

    By keeping data in the database whenever possible, you eliminate the time involved in transferring the data to your client Python engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.

    See Also: About Moving Data Between the Database and a Python Session

  • Keep data secure

    By keeping the data in the database, you have the security, scalability, reliability, and backup features of the database for managing the data.

  • Use the power of the database

    By operating directly on data in the database, you can use the memory and processing power of the database and avoid the memory constraints of your client Python engine.

  • Use current data

    As data is refreshed in the database, you have immediate access to current data.

  • Save Python objects to a datastore in the database

    You can save Python objects to an OML4Py datastore for future use and for use by others.

    See Also: About OML4Py Datastores

  • Build and store native Python models in the database

    Using Embedded Python Execution, you can build native Python models and store and manage them in an OML4Py datastore.

    You can also build in-database models, with, for example, an oml class such as the Decision Tree class oml.dt. These in-database models have proxy objects that reference the actual models. Keeping with normal Python behavior, when the Python engine terminates, all in-memory objects, including models, are lost. To prevent an in-database model created using OML4Py from being deleted when the database connection is terminated, you must store its proxy object in a datastore.

    See Also: About Machine Learning Classes and Algorithms

  • Score data

    For most of the OML4Py machine learning classes, you can use the predict and predict_proba methods of the model object to score new data.

    For these OML4Py in-database models, you can also use the SQL PREDICTION function on the model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability.

  • Run user-defined Python functions in embedded Python engines

    Using OML4Py Embedded Python Execution, you can store user-defined Python functions in the OML4Py script repository, and run those functions in Python engines spawned by the database environment. When a user-defined Python function runs, the database starts, controls, and manages one or more Python engines that can run in parallel. With the Embedded Python Execution functionality, you can do the following:

    • Use a select set of Python packages in user-defined functions that run in embedded Python engines

    • Use other Python packages and third-party package in user-defined Python functions that run in embedded Python engines

    • Operationalize user-defined Python functions for use in production applications and eliminate porting Python code and models into SQL, and on ADB, REST; avoid reinventing code to integrate Python results into existing applications

    • Seamlessly leverage your Oracle database as a high-performance computing environment for user-defined Python functions, providing data parallelism and resource management

    • Perform parallel simulations, for example, Monte Carlo analysis, using the oml.index_apply function

    • Generate JSON images, PNG images and XML representations of both structured and image data, which can be used by Python clients and SQL-based applications. PNG images and structured data can be used for Python clients and applications that use REST APIs.

    See Also: About Embedded Python Execution