4.1.3.5 Computing Recommended Threshold

This topic provides option for choosing appropriate outlier technique and threshold computing technique.

Select Parameters for Tuning

In this option, you need to select the parameters for which thresholds will be calculated. The default parameters are taken from the expression that the user must have set in the scenario execution notebook. The parameters are stored in the class variable asc.tunable_parameters_list.

Limitation

The Select Parameters for Tuning paragraph might be failed because of missing class object before the variable used.

Figure 4-15 Error Message - Select Parameters for Tuning



To resolve this issue, add another paragraph above the failed paragraph with the following and run the paragraph.
%python-pgx
tunable_parameters_list = asc.tunable_parameters_list

Figure 4-16 Select Parameters for Tuning



Check Data Distribution

You can see the data distribution for selected threshold set id and parameters and decide on the outlier techniques for calculating the thresholds to use.

Parameter Sample Fraction can be used to take the subset of data. The Parameter Sample Fraction value is between 0 to 1.

Figure 4-17 Check Data Distribution



Select Outlier Detection Method

You are provided with out-of-box univariate outlier methods. The outlier methods are divided into two groups based on the type of population. For each of the method, you can select the tail either RIGHT, LEFT or BOTH for removing outliers.

For Normal Population
  • zscore: Outliers are number of standard deviations away from the mean.
    • Parameter nstdev. Default is 3.
  • IQR: Inter-Quantile-Range (Q75-Q25). Outliers are considered which is away from Q75 + IQR*iqr_cut_off.
    • Parameter iqr_cut_off. Default is 3.
  • percent_outliers: Percent of data points are considered to be outliers.
    • Parameter outliers_proportion. Default is 5%.
For Skewed Population
  • robust_zscore: It is similar to Z-score method with some changes in parameters.
    • Parameter nstdev. Default is 3.
    • Since mean and standard deviations are heavily influenced by outliers, instead of them it uses median and absolute deviation from median.
    • Also called Median Absolute Deviation (MAD) method.
  • adjusted_boxplot: It is similar to IQR method with some changes in parameters.
    • It customize the range of valid data for both side of tails differently.
    • Exponential model is used for fitting the data.

Figure 4-18 Select Outlier Detection Method



Select Threshold Computing Techniques

You are provided with following two out-of-boxes threshold computing techniques:
  • Percentile: Thresholds are calculated based on the percentiles.
  • Jump: Thresholds are calculated based on the highest peaks found within the range defined.
    • Parameter range. Percentile range in which peaks will be found. It works similar like a range method in python.
    • If range is None:
      • Default range (85,100,0.1) is used.
      • Highest peak is found for HR between 85-90%, MR between 90-95% and RR between 95-99.9%.
      • If none of the peaks is found in range specified, then threshold is set at minimum percentile at 85%, 90% and 95% for HR, MR and RR respectively.
      • If any peak is missing for any range specified, then threshold for missing peak is set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.
    • If range is not None:
      • range is passed as a tuple.
      • Three highest peaks will be found between the range defined by the user. For example, (80,100) and (85,100,0.2).

        If none of the peaks is found in range specified, then thresholds for all will be set at minimum percentile. For example, 85%.

        If any peak is missing for any range specified, then threshold for missing peak will be set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.

      • range is passed as a dict
        • Highest peak will be found as per the range defined by the user for each risk level.
        • For example, {'HR':(80,90,0.2),'MR':(90,95),'RR':(95,99,0.1)}, {'HR':(85,90),'MR':(90,95,0.1),'RR':(95,100,0.2)}
        • If none of the peaks is found in range specified, then threshold will be set at minimum percentile passed by the user in specified range.

Compute Recommended Thresholds

The API takes all the parameters (tunable parameters, outlier technique, and threshold computing technique) as an input and calculates the thresholds for each of the tunable parameters across the selected threshold set ids.
%python-pgx

asc.compute_initial_thresholds(features=features_list,
                                outlier_method=outlier_method,
                                technique=technique,
                                outliers_proportion=outliers_proportion,
                                nstdev=nstdev,
                                robust_zscore_cut_off=robust_zscore_nstdev,
                                iqr_cut_off=iqr_cutoff,
                                anomaly_proportion=anomaly_proportion,
                                perc_list=perc_list,
                                search_range=search_range,
                                tail=tail)