New Features in 23ai

1.1 New Features in 23ai

Oracle Machine Learning for Python: new features in Oracle Database 23ai.

Algorithm Enhancements

Note:

New Algorithm Settings: You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide. See Oracle Database PL/SQL Packages and Types Reference guide.

GLM link functions

GLMS_LINK_FUNCTION: this setting enables the user to specify the link function for building a generalized linear model. The additional link functions are: Logit, Probit, Cloglog, and Cauchit. See Generalized Linear Model.
XGBoost

The following new settings are added for XGBoost support for constraints and survival analysis.

Note:
The XGBoost settings are case sensitive.
- Interaction and Monotonic Constraints
  - xgboost_interaction_constraints
  - xgboost_decrease_constraints
  - xgboost_increase_constraints
- Support for Survival Analysis
  - objective: survival:aft
  - xgboost_aft_loss_distribution
  - xgboost_aft_loss_distribution_scale
  - xgboost_aft_right_bound_column_name
Oracle Machine Learning supports XGBoost features such as monotonic and interaction constraints, as well as the AFT model for survival analysis. See XGBoost.
Explicit Semantic Analysis (ESA)

The following settings are added to support generate embeddings through Explicit Semantic Analysis embeddings:
- ESAS_EMBEDDINGS: when enabled, generates embeddings during scoring for feature extraction models.
- ESAS_EMBEDDING_SIZE: specifies the size of the vectors representing embeddings.
Supports embeddings for the Explicit Semantic Analysis (ESA) algorithm. ESA embeddings enables you to utilize ESA models to generate embeddings for any text or other ESA input. This functionality is equivalent to doc2vec (document to vector representation). See Explicit Semantic Analysis.
Expectation Maximization

EMCS_OUTLIER_RATE: identifies the frequency of outliers in the training data. See Expectation Maximization.
Exponential Smoothing Model
New settings for Exponential Smoothing to support Time Series regression models and initial value optimization for model build:
- Multiple time series
  
  EXSM_SERIES_LIST:setting enables you to forecast up to twenty predictor series in addition to the target series.
- Automated model type search
  
  EXSM_INITVL_OPTIMIZE: determines whether initial values are optimized during model build.
Exponential Smoothing is enhanced to support building of multiple time series models and time series regression is possible with the multi-series build. The behavior of Exponential Smoothing is modified such that it searches for an acceptable time series model automatically. Enables the algorithm to select the best model type automatically when you do not specify EXSM_MODEL setting. This leads to more accurate forecasting. For details, see Exponential Smoothing Method.
K-Means

KMNS_WINSORIZE: this setting restricts the data in a window size of six standard deviations around the mean. See k-Means.

General Enhancements

New shared settings
- ODMS_BOXCOX: this setting enables the Box-Cox variance-stabilization transformation.
- ODMS_EXPLOSION_MIN_SUPP: introduced more efficient data driven encoding for high cardinality categorical columns. You can define minimum support required for the categorical values in explosion mapping.
See Shared Settings.
Convert Pretrained Models to ONNX Format
OML4Py enables the use of text transformers from Hugging Face by converting them into ONNX format models. OML4Py also adds the necessary tokenization and post-processing. The resulting ONNX pipeline is then imported into the database and can be used to generate embeddings for AI Vector Search. See Convert Pretrained Models to ONNX Format.
Model Includes Data Lineage

In-database ML models now record the query string that was run to specify the build data within the model's metadata. The build_source parameter in the all/user/dba_mining_models view enables users to know the data query used to produce the model. See ALL_MINING_MODELS.
Improved Performance of Partitioned Models
Performance of partitioned models with high number of partitions and dropping individual models within partition model is improved. To know more about partitioned models, see DDL in Partitioned model.
4k Columns in Table:
The database tables can now accommodate up to 4,096 columns. This functionality is referred to as Wide Tables. To enable or disable Wide Tables for your Oracle database, you can use the MAX_COLUMNS parameter. See MAX_COLUMNS.

Parent topic: Changes in This Release for Oracle Machine Learning for Python