4.1.3.5 Computing Recommended Threshold
This topic provides option for choosing appropriate outlier technique and threshold computing technique.
Select Parameters for Tuning
In this option, you need to select the parameters for which thresholds will be calculated. The default parameters are taken from the expression that the user must have set in the scenario execution notebook. The parameters are stored in the class variable asc.tunable_parameters_list.
Limitation
Figure 4-15 Error Message - Select Parameters for Tuning
%python-pgx
tunable_parameters_list = asc.tunable_parameters_list
Check Data Distribution
You can see the data distribution for selected threshold set id and parameters and decide on the outlier techniques for calculating the thresholds to use.
Select Outlier Detection Method
You are provided with out-of-box univariate outlier methods. The outlier methods are divided into two groups based on the type of population. For each of the method, you can select the tail either RIGHT, LEFT or BOTH for removing outliers.
- zscore: Outliers are number of standard deviations away
from the mean.
- Parameter nstdev. Default is 3.
- IQR: Inter-Quantile-Range (Q75-Q25). Outliers are
considered which is away from Q75 + IQR*iqr_cut_off.
- Parameter iqr_cut_off. Default is 3.
- percent_outliers: Percent of data points are considered
to be outliers.
- Parameter outliers_proportion. Default is 5%.
- robust_zscore: It is similar to Z-score method with some
changes in parameters.
- Parameter nstdev. Default is 3.
- Since mean and standard deviations are heavily influenced by outliers, instead of them it uses median and absolute deviation from median.
- Also called Median Absolute Deviation (MAD) method.
- adjusted_boxplot: It is similar to IQR method with some
changes in parameters.
- It customize the range of valid data for both side of tails differently.
- Exponential model is used for fitting the data.
Figure 4-18 Select Outlier Detection Method
Select Threshold Computing Techniques
- Percentile: Thresholds are calculated based on the percentiles.
- parameter perc_list. Default is [85,90,95]
- User can pass its own percentile's list.
- If None is passed, then threshold will be set at
85%, 90% and 95% for HR, MR and RR respectively.
Figure 4-19 Select Threshold Computing Techniques
- Jump: Thresholds are calculated based on the highest peaks found
within the range defined.
- Parameter range. Percentile range in which peaks will be found. It works similar like a range method in python.
- If range is None:
- Default range (85,100,0.1) is used.
- Highest peak is found for HR between 85-90%, MR between 90-95% and RR between 95-99.9%.
- If none of the peaks is found in range specified, then threshold is set at minimum percentile at 85%, 90% and 95% for HR, MR and RR respectively.
- If any peak is missing for any range specified, then threshold for missing peak is set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.
- If range is not None:
- range is passed as a tuple.
- Three highest peaks will be found between the
range defined by the user. For example, (80,100) and
(85,100,0.2).
If none of the peaks is found in range specified, then thresholds for all will be set at minimum percentile. For example, 85%.
If any peak is missing for any range specified, then threshold for missing peak will be set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.
- range is passed as a dict
- Highest peak will be found as per the range defined by the user for each risk level.
- For example, {'HR':(80,90,0.2),'MR':(90,95),'RR':(95,99,0.1)}, {'HR':(85,90),'MR':(90,95,0.1),'RR':(95,100,0.2)}
- If none of the peaks is found in range specified, then threshold will be set at minimum percentile passed by the user in specified range.
Compute Recommended Thresholds
%python-pgx
asc.compute_initial_thresholds(features=features_list,
outlier_method=outlier_method,
technique=technique,
outliers_proportion=outliers_proportion,
nstdev=nstdev,
robust_zscore_cut_off=robust_zscore_nstdev,
iqr_cut_off=iqr_cutoff,
anomaly_proportion=anomaly_proportion,
perc_list=perc_list,
search_range=search_range,
tail=tail)