Preparing the Data
Define how to assess and manage the data before performing the prediction, for example, how to handle outliers or missing values. These options for preparing the data can improve the quality of the data used for the prediction.
Define how to manage the data for each driver. Note that a Target value of Yes indicates the target measure that is being predicted.
- For future input driver data, select Predict missing input driver
values if you want to predict values where input driver values are
missing.
Missing values are predicted using statistical forecasting (univariate predictions), and are used for ML model training.
- For each driver, define how to handle missing values: click the Edit icon
in the Actions column, and then, from the Missing Values list, select an option for handling missing values for the driver.
Data can contain missing values for a several reasons, such as measurement failures, formatting problems, human errors, or lack of information. You define how to fill these missing values, which adds standardized values to missing entries in the dataset.
- None—Take no action and send the data as is.
- Zero—Replace missing values for any column with zero.
- Replace with Mean—Replace with the mean across the historical series.
- Replace with Median—Replace with the median point of the historical series.
- Replace with Mode—Replace with the most common value in the historical data.
- Replace with Next Observed Value—Replace missing values with the value that was observed in the next period.
- Replace with Last Observed Value—Replace missing values with the value that was observed in the previous period.
- For each driver, from the Outliers list, select an option to
use to handle outlier values, which are any values that fall out of range
for the driver:mean +/- 3*Standard Deviation
-
None—Take no action and send the data as is.
-
Replace with Zero—Replace with zero.
-
Replace with Mean—Replace with the mean.
-
Replace with Z_score—Replace with the z_score.
For any numerical column, any value falling out of
mean +/- 3*Standard Deviation
(std dev) is treated as an outlier. A value less thanmean - 3*std dev
is replaced withmean -3*std dev
. A value greater thanmean + 3*std dev
is replaced withmean + 3*std dev
.
-
- Click Next.