1.2 In-Database Machine Learning Algorithms
OML4R offers a natural R API for using in-database algorithms with the R Formula specification. These algorithms support classification, regression, clustering, attribute importance, anomaly detection, association rules, time series analysis, and feature extraction. Key features of in-database algorithms include automatic, algorithm-specific data preparation, partitioned model ensembles, and integrated text mining.
Models generated using the R API can be accessed through the SQL API, exported as flat files for separate management, imported into other Oracle Database and Autonomous Database instances, or deployed to OML Services.
Advantages of Oracle Machine Learning for R
Using OML4R to prepare and analyze data in an Oracle Database instance has many advantages for an R user.
With OML4R, you can do the following:
- Operate on Database-Resident Data Without Using SQL. OML4R transparently translates many standard R functions into SQL. With OML4R, you can create R proxy objects that access, analyze, and manipulate data that resides in the database. OML4R can automatically optimize the SQL by taking advantage of column indexes, query optimization, table partitioning, and database parallelism.
OML4R overloaded functions are available for many commonly used R functions, including those on R data frames for in-database execution.
- Minimize Data Movement. By keeping the data in the database whenever possible, you eliminate the time involved in transferring the data to your client R engine and the need to store the data locally. You also eliminate the need to manage the locally stored data, which includes tasks such as distributing the data files to the appropriate locations, synchronizing the data with changes that are made in the production database, and so on.
- Keep Data Secure. By keeping the data in the database, you have the security, scalability, reliability, and backup features of Oracle Database for managing the data.
- Use the Power of the Database. By operating directly on data in the database, you can use the memory and processing power of the database environment and avoid the memory constraints of your client R engine.
- Use Current Data. As data is refreshed in the database, you have immediate access to current data.
- Prepare Data in the Database. Using the transparency layer functions, prepare large database-resident data sets for predictive analysis through operations such as ordering, aggregating, filtering, recoding, and the use of comprehensive sampling techniques without having to write SQL code.
- Save R Objects in the Database. You can save native R objects and OML4R proxy objects in the OML4R datastore for use across R sessions and to share with other users. You can store R and OML4R objects in an OML4R datastore, which is managed by the Oracle Database instance.
- Build Models in the Database. Use in-database parallelized and distributed machine learning algorithms from OML4SQL to build more models on more data, and score large volume data – faster. Use in-database algorithms from OML4SQL via well-integrated R API, which now includes in-database algorithms for Neural Networks, Random Forest, Exponential Smooth, and XGBoost. Increase productivity from in-database algorithm automatic data preparation, partitioned models, and integrated text mining capabilities.
You can use functions in packages that you download from CRAN (The Comprehensive R Archive Network) to build models that use techniques such as ensemble modeling.
- Run user-defined R functions in embedded R engines. Using OML4R Embedded R execution functionality, you can store user-defined R functions in the OML4R script repository, and run those functions in R engines spawned by the database environment. When a user-defined R function runs, the database spawns and manages one or more R engines that can run in parallel. With the Embedded R execution functionality, you can do the following:
-
Use a select set of R packages in user-defined functions that run in embedded R engines.
-
Use other R packages in user-defined R functions that run in Embedded R engines.
-
Operationalize user-defined R functions for use in production applications and eliminate porting R code and models into other languages; avoid reinventing code to integrate R results into existing applications.
-
Seamlessly leverage your Oracle database instance as a high-performance computing environment for user-defined R functions, providing data parallelism and resource management.
-
Perform simulations, for example, Monte Carlo analysis, using system-supported task parallelism.
-
Call user-defined R functions from R, SQL, and REST APIs.
-
Return structured data.frame results and PNG images from user-defined R functions as tables as well as XML containing both structured and image content.
-
- Integrate with the Oracle Technology Stack. You can take advantage of all aspects of the Oracle technology stack to integrate your data analysis within a larger framework for business intelligence or scientific inquiry. For example, you can integrate the results of your OML4R analysis into Oracle Analytics Cloud and Oracle APEX using SQL to call R functionality.
Parent topic: About Oracle Machine Learning for R