Data Drift Summary

A Data Drift Summary report summarizes the changes in a dataset over time. It typically includes information for any changes in the data distribution of the selected features over the selected baseline and target date ranges.

The report is used to help identify potential issues with data quality which helps the users to take necessary actions further.

To calculate the data drift, perform the following steps:

In the Data Drift Summary screen, click Calculate Drift.
The Data Drift Analysis window is displayed.

Figure 8-26 Data Drift Summary screen
Select the data range (From and To) dates for the baseline.
Select the data range (From and To) dates for the target.
Select the drift method from the drop-down.
The available drift methods are:
- Kolmogorov–Smirnov (K-S) test
  - Only for numerical features
  - Output: p_value, drift detected when p_value < threshold.
- Kullback-Leibler divergence
  - For numerical and categorical features
  - Output: divergence, drift detected when divergence >= threshold.
- Wasserstein distance (normed)
- Only for numerical features
- Output: distance, drift detected when distance >= threshold.
- Population Stability Index (PSI)
  - For numerical and categorical features
  - Output: psi_value, drift detected when psi_value >= threshold.
- Jensen-Shannon distance
  - For numerical and categorical features
  - Output: distance, drift detected when distance >= threshold.
Click Calculate.
The drift analysis report is displayed.
Click Save.
The reports are displayed under Data Drift Summary screen.