Computing Recommended Threshold

Select Parameters for Tuning

In this option, you need to select the parameters for which thresholds will be calculated. The default parameters are taken from the expression that the user must have set in the scenario execution notebook. The parameters are stored in the class variable asc.tunable_parameters_list.

Limitation

The Select Parameters for Tuning paragraph might be failed because of missing class object before the variable used.

Figure 4-15 Error Message - Select Parameters for Tuning

Description of "Figure 4-15 Error Message - Select Parameters for Tuning"

To resolve this issue, add another paragraph above the failed paragraph with the following and run the paragraph.

%python-pgx
tunable_parameters_list = asc.tunable_parameters_list

Figure 4-16 Select Parameters for Tuning

Description of "Figure 4-16 Select Parameters for Tuning"

Check Data Distribution

You can see the data distribution for selected threshold set id and parameters and decide on the outlier techniques for calculating the thresholds to use.

Parameter Sample Fraction can be used to take the subset of data. The Parameter Sample Fraction value is between 0 to 1.

Figure 4-17 Check Data Distribution

Description of "Figure 4-17 Check Data Distribution"

Select Outlier Detection Method

You are provided with out-of-box univariate outlier methods. The outlier methods are divided into two groups based on the type of population. For each of the method, you can select the tail either RIGHT, LEFT or BOTH for removing outliers.

For Normal Population

zscore: Outliers are number of standard deviations away from the mean.
- Parameter nstdev. Default is 3.
IQR: Inter-Quantile-Range (Q75-Q25). Outliers are considered which is away from Q75 + IQR*iqr_cut_off.
- Parameter iqr_cut_off. Default is 3.
percent_outliers: Percent of data points are considered to be outliers.
- Parameter outliers_proportion. Default is 5%.

For Skewed Population

robust_zscore: It is similar to Z-score method with some changes in parameters.
- Parameter nstdev. Default is 3.
- Since mean and standard deviations are heavily influenced by outliers, instead of them it uses median and absolute deviation from median.
- Also called Median Absolute Deviation (MAD) method.
adjusted_boxplot: It is similar to IQR method with some changes in parameters.
- It customize the range of valid data for both side of tails differently.
- Exponential model is used for fitting the data.

Figure 4-18 Select Outlier Detection Method

Description of "Figure 4-18 Select Outlier Detection Method"

Select Threshold Computing Techniques

You are provided with following two out-of-boxes threshold computing techniques:

Percentile: Thresholds are calculated based on the percentiles.
- parameter perc_list. Default is [85,90,95]
- User can pass its own percentile's list.
- If None is passed, then threshold will be set at 85%, 90% and 95% for HR, MR and RR respectively.
  
  Figure 4-19 Select Threshold Computing Techniques
  
  Description of "Figure 4-19 Select Threshold Computing Techniques"
Jump: Thresholds are calculated based on the highest peaks found within the range defined.
- Parameter range. Percentile range in which peaks will be found. It works similar like a range method in python.
- If range is None:
  - Default range (85,100,0.1) is used.
  - Highest peak is found for HR between 85-90%, MR between 90-95% and RR between 95-99.9%.
  - If none of the peaks is found in range specified, then threshold is set at minimum percentile at 85%, 90% and 95% for HR, MR and RR respectively.
  - If any peak is missing for any range specified, then threshold for missing peak is set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.
- If range is not None:
  - range is passed as a tuple.
  - Three highest peaks will be found between the range defined by the user. For example, (80,100) and (85,100,0.2).
    If none of the peaks is found in range specified, then thresholds for all will be set at minimum percentile. For example, 85%.
    
    If any peak is missing for any range specified, then threshold for missing peak will be set at same to last peak found. For example, No peak found for RR, then threshold for RR will be set to same threshold as of MR.
  - range is passed as a dict
    - Highest peak will be found as per the range defined by the user for each risk level.
    - For example, {'HR':(80,90,0.2),'MR':(90,95),'RR':(95,99,0.1)}, {'HR':(85,90),'MR':(90,95,0.1),'RR':(95,100,0.2)}
    - If none of the peaks is found in range specified, then threshold will be set at minimum percentile passed by the user in specified range.

Compute Recommended Thresholds

The API takes all the parameters (tunable parameters, outlier technique, and threshold computing technique) as an input and calculates the thresholds for each of the tunable parameters across the selected threshold set ids.

%python-pgx

asc.compute_initial_thresholds(features=features_list,
                                outlier_method=outlier_method,
                                technique=technique,
                                outliers_proportion=outliers_proportion,
                                nstdev=nstdev,
                                robust_zscore_cut_off=robust_zscore_nstdev,
                                iqr_cut_off=iqr_cutoff,
                                anomaly_proportion=anomaly_proportion,
                                perc_list=perc_list,
                                search_range=search_range,
                                tail=tail)