7 Get Started with Data Monitoring
Data Monitoring evaluates how your data evolves over time. It helps you with insights on trends and multivariate dependencies in the data. It also gives you an early warning about data drift.
Data drift occurs when data diverges from the original baseline data over time. Data drift can happen for a variety of reasons, such as a changing business environment, evolving user behavior and interest, data modifications from third-party sources, data quality issues, or issues with upstream data processing pipelines.
The key to accurately interpret your models and to ensure that the models are able to solve business problems is to understand how data evolves over time. Data monitoring is complementary to successful model monitoring, as understanding the changes in data is critical in understanding the changes in the efficacy of the models. The ability to quickly and reliably detect changes in the statistical properties of your data ensures that your machine learning models are able to meet business objectives.
Figure 7-1 Data Monitors page
![Data Monitors page Data Monitors page](img/data_monitor_page.png)
- Create: Create a data monitor.
Note:
The supported data types for data monitoring are NUMERIC and CATEGORICAL. - Edit: Select a data monitor and click Edit to edit a data monitor.
- Duplicate: Select a data monitor and click Duplicate to create a copy of the monitor.
- Delete: Select a data monitor and click Delete to delete a data monitor.
- History: Select a data monitor and click History to view the runtime details. Click Back to Monitors to go back to the Data Monitoring page.
- Start: Start a data monitor.
- Stop: Stop a data monitor that is running.
- More: Click More for additional
options to:
Figure 7-2 More option under Data Monitors
- Enable: Select a data monitor and
click Enable to enable a monitor that is
disabled. By default, a data monitor is enabled. The status is displayed
as
SCHEDULED
. - Disable: Select a data monitor and
click Disable to disable a data monitor. The
status is displayed as
DISABLED
. - Show Managed Monitors: Click this
option to view the data monitors that are created and managed by OML
Services REST API and Model Monitors in Oracle Machine Learning UI. The
data monitors that are managed by these two components have a system
generated name, and are indicated by specific icons against its name.
- Click on the link icon against a managed data
monitor name to view the details of the associated model
monitor. The associated model monitor details are displayed on a
separate pane that slides in. The slide-in pane displays the
model monitor name with links to view the model monitor results
and settings. Clicking on the link icon also displays the data
drift details on the lower pane of the Data Monitors page. Click
on the X on the top left corner to close the pane.
Figure 7-3 Data Monitors page displaying the associated model monitor results and settings
In this example, the slide-in pane displays the details of the model monitor Power Consumption. On the slide-in pane:
- Click Model Monitor Results to view the results computed by the model monitor — settings, models, model drift, metric, and prediction statistics. Click Monitors to return to the Data Monitors page. See View Model Monitor Results.
- Click Model Monitor Settings to view and edit the settings, details, and models monitored by the model monitor on the Edit Model Monitor page. Click Cancel to return to the Data Monitors page. Click Save to save any changes.
- Click on the check box against the data monitor
name to view the data drift values on the lower pane.
Figure 7-4 Select a managed data monitor
- Click on the data monitor name to view the details
of the data monitor — settings, data drift values and monitored
features.
Figure 7-5 Data monitor click
- Click on the link icon against a managed data
monitor name to view the details of the associated model
monitor. The associated model monitor details are displayed on a
separate pane that slides in. The slide-in pane displays the
model monitor name with links to view the model monitor results
and settings. Clicking on the link icon also displays the data
drift details on the lower pane of the Data Monitors page. Click
on the X on the top left corner to close the pane.
- Enable: Select a data monitor and
click Enable to enable a monitor that is
disabled. By default, a data monitor is enabled. The status is displayed
as
The Data Monitors page displays the information about the selected monitor: Monitor name, Baseline Data, New Data, Last Start Date, Last Status, Next Run Data, Status, and Schedule. The page also displays the data drift, if the data monitor has run successfully. To view data drift:
Figure 7-6 Data Drift preview on Data Monitors page
![Data drift preview on Data Monitors page Data drift preview on Data Monitors page](img/data_monitoring_page.png)
Select a data monitor that has run successfully, as shown in the screenshot. On the lower pane, the data drift of the selected monitor is displayed. The X axis depicts the analysis period, and the Y axis depicts the data drift values. The horizontal dotted line is the threshold value, and the line depicts the drift value for each point in time for the analysis period. Hover your mouse over the line to view the drift values. For more information on this example, see View Data Monitor Results.
- Create a Data Monitor
Data Monitoring allows you to detect data drift over time and the potentially negative impact on the performance of your machine learning models. On the Data Monitor page, you can create, run, and track data monitors and the results. - View Data Monitor Results
The Data Monitor Results page displays the information on the selected data monitor that have run successfully, along with data drift details for each monitored feature. - View History
The History page displays the runtime details of data monitors.
Related Topics
7.1 Create a Data Monitor
Data Monitoring allows you to detect data drift over time and the potentially negative impact on the performance of your machine learning models. On the Data Monitor page, you can create, run, and track data monitors and the results.
Parent topic: Get Started with Data Monitoring
7.2 View Data Monitor Results
The Data Monitor Results page displays the information on the selected data monitor that have run successfully, along with data drift details for each monitored feature.
- Settings — The Settings section displays the data monitor settings. Click on the arrow against Settings to expand this section. You have the option to edit the data monitor settings by clicking Edit on the top right corner of the page. In this screenshot, the settings for the data monitor Power Consumption is seen.
Figure 7-10 Settings section on the Data Monitor Results page
- Drift — The Drift section displays the details of data drift for each monitored feature. In this example, the data monitor Power consumption data monitor is selected. The X axis depicts the analysis period, and the Y axis depicts the data drift values. The horizontal dotted line is the threshold value, and the line depicts the drift value for each point in time for the analysis period. Hover your mouse over the line to view the drift values.
Figure 7-11 Data Drift section on the Data Monitor Results page
-
Features — The Features section displays the monitored features along with the computed statistics.
Figure 7-12 Features section on the Data Monitor Results page
The value in the Importance column indicates how impactful the feature has been on data drift over a specified time period.
For numerical data, the following statistics are computed:- Mean
- Standard Deviation
- Range (Minimum, Maximum)
- Number of nulls
For categorical data, the following statistics are computed:- Number of unique values
- Number of nulls
For each monitored feature, hover your mouse to view the following additional details, as shown in the screenshot here.
- First: This is the first value of the computed statistics for the analysis period.
- Last: This is the last value of the computed statistics for the analysis period.
- Max: This is the highest value of the computed statistics for the analysis period.
- Min: This is the lowest value of the computed statistics for the analysis period.
- Click on any monitored feature in the Features section to view the Metric, Statistics, Distribution, and Distribution with Crosstab Column, as shown in the screenshot here. In the screenshot here, the Population Stability Index is shown for the feature GLOBAL_REACTIVE_POWER.
Figure 7-13 Population Stability Index
The computations include:- Metric: The following metrics are computed:
- Population Stability Index (PSI): This is a measure of how much a population has shifted over time or between two different samples of a population in a single number. The two distributions are binned into buckets, and PSI compares the percents of items in each of the buckets. PSI is computed as
The interpretation of PSI value is:PSI = sum((Actual_% - Expected_%) x ln (Actual_% / Expected_%))
PSI < 0.1
implies no significant population change0.1 <= PSI < 0.2
implies moderate population changePSI >= 0.2
implies significant population change
- Jenson Shannon Distance (JSD): This is a measure of the similarity between two probability distributions. JSD is the square root of the Jensen-Shannon Divergence which is related to the Kullbach-Leibler Divergence (KLD). JSD is computed as:
SD(P || Q)= sqrt(0.5 x KLD(P || M) + 0.5 x KLD(Q || M))
Where, P and Q are the 2 distributions,
M = 0.5 x (P + Q), KLD(P || M) = sum(Pi x ln(Pi / Mi)), and KLD(Q || M) = sum(Qi x ln(Qi / Mi))
The value of JSD ranges between 0 and 1.
- Crosstab Population Stability Index: This is the PSI for two variables.
- Crosstab Jenson Shannon Distance: This is the JSD for two variables.
- Population Stability Index (PSI): This is a measure of how much a population has shifted over time or between two different samples of a population in a single number. The two distributions are binned into buckets, and PSI compares the percents of items in each of the buckets. PSI is computed as
- Statistics: You can view statistics for up to 3 selected periods. Data drift is quantified using these statistical computations.
Figure 7-14 Statistics
For numerical data, the following statistics are computed:- Mean
- Standard Deviation
- Range (Minimum, Maximum)
- Number of nulls
For categorical data, the following statistics are computed:- Number of unique values
- Number of nulls
- Distribution: The feature distribution chart with legend displays bins of feature for selected periods and the baseline (optional).
Figure 7-15 Distribution Chart and Distribution with Crosstab column
- Distribution with Crosstab Column: The heat map indicates the density of distribution for the selected crosstab and the feature column. Red denotes highest density.
Note:
In data drift monitoring,nulls
are are tracked separately asnumber_of_missing_values
.
- Metric: The following metrics are computed:
Parent topic: Get Started with Data Monitoring
7.3 View History
The History page displays the runtime details of data monitors.
Select a data monitor and click History to view the runtime details. The history page displays the following information about the data monitor runtime:
Figure 7-16 Data Monitor History page
![Data Monitor History page Data Monitor History page](img/view_history.png)
- Actual Start Date: This is the date when the data monitor actually started.
- Requested Start Date: This is the date entered in the
Start Date
field while creating the data monitor. - Status: The statuses are
SUCCEEDED
andFAILED
. - Detail: If a data monitor fails, the details are listed here.
- Duration: This is the time taken to run the data monitor.
Click Back to Monitors to go back to the Data Monitoring page.
Parent topic: Get Started with Data Monitoring