Selecting Algorithms for Advanced Predictions Planning
Select the algorithms to use for the advanced prediction in Planning .
To define the algorithm to use for the advanced prediction:
- In the Select Algorithms section, select the algorithm to
use:
-
Oracle AutoMLx—A proprietary suite of ten algorithms (including both univariate and multivariate); automatically selects the best model for the given error measure. Runs all of these algorithms and selects the best option with the best results for you.
- Runs various statistical models and machine learning algorithms on your data.
- Tunes and validates the models.
- Finds the best model for your data.
- Fits your data to the best model.
-
Light GBM—Light Gradient Boosting, an ensemble- and tree-based, speed efficient algorithm suited for larger data sets. Best suited for data sets where time has less weight as compared to other features.
-
XGBoost—Extreme Gradient Boosting, an ensemble- and tree-based algorithm, best suited for data sets where time has less weight as compared to other features.
-
SARIMAX—Arima with exogeneous algorithms.
-
Prophet—Time series algorithm best suited for data with strong seasonal effects and several seasons of historical data.
- Linear Regression—Simple linear relationships.
- This method is simple and easy to understand. For example, "I want to forecast monthly sales based primarily on marketing spend and pricing. The relationship is straightforward; higher marketing drives higher sales, higher prices reduce sales."
- This method is best when there are one to five drivers with clear linear relationships.
- Lasso Regression—Linear with many drivers.
- This method is useful when you have many drivers. It automatically identifies and selects the most impactful drivers, and removes weak drivers. For example, "I have eight potential cost drivers for our operating expenses (labor, materials, utilities, maintenance, and so on), but I'm not sure which ones actually drives the variance. I want the system to automatically identify the key drivers."
- This method is best when you have many drivers and aren't sure which drivers matter.
- Ridge Regression—Linear with correlated drivers.
- This method retains all drivers while weighting their relative importance, and handles correlation well. For example, "I want to forecast fee revenue using assets under management, market index performance, and trading volume. These metrics are correlated with each other but all are important drivers of our revenue."
- This method when is best when drivers are correlated but all matter.
Note:
In Cloud EPM OCI realms OC2, OC3, and OC4, only SARIMAX and Prophet algorithms are supported. -
- Select the Forecast Error Metric to use for the selected
algorithm to define how the algorithm should select the best model. It optimizes the
model training based on the selected error metric to determine the best option to
use for the prediction. The ML engine learns the patterns from the data, and looks
for the best option to minimize errors to the extent possible. The ML engine
evaluates each iteration against the error metric you select, and selects the
iteration when the error metric is the lowest.
- sMAPE—Symmetric Mean Absolute Percentage Error
- MAPE—Mean Absolute Percentage Error
- RMSE—Root Mean Squared Error
Using your choice of error measure, Advanced Predictions:
- Chooses the model with the least error as the best model.
- For the best model:
- Generates fitted series corresponding to the input series.
- Generates forecast for the horizon.
About Regression Algorithms
For regression models, the error measure reported is for the holdout period by doing an 90:10 split. A portion of historical data is set aside and not used during model training and is instead used to validate model performance:
- Historical data is split into training data (used to build the model) and holdout data (used for validation).
- The model is trained on the training dataset.
- Predictions are generated for the holdout period.
- Predictions are compared with actual values to measure accuracy. The accuracy and fitted values are calculated on the holdout period.
For example, if you have five years of monthly historical data (60 data points):
- These 60 data points are split 90:10. 90% of the data points (54) are used for training the model and 10% of the data points (6) are used as test data points.
- The test model is created by using 54 data points. The prediction is for 6 data points.
- The predictions and test data are compared and the following error measures are calculated: SMAPE, MAPE and RMSE.
- Accuracy is calculated as 100 – SMAPE.
Regression algorithms are useful in some scenarios where time series forecasting is not applicable. For example, use regression algorithms in these cases:
- Specific drivers matter more than time patterns.
- You have clear causal relationships (marketing drives sales, price affects demand).
- You want to understand "what happens if we change X."
- You need to explain driver impacts to stakeholders.
- Time patterns are weak or irregular.