ofs_aif.ofs_asc package¶
Submodules¶
ofs_aif.ofs_asc.asc module¶
- class asc(connect_with_default_workspace=True)¶
Bases:
ofs_aif.aif.aif
,ofs_aif.aif_utility.aif_utility
This class
ofs_asc
contains the python methods for automatic scenario calibration use cae- add_model_groups(model_group_name=None)¶
Create segmentation (model group) for AMLES
- Parameters
model_group_name – Unique name for the model group. Only alphanumeric character set including underscore, hyphen and space are allowed
- Returns
successful message on successfully creating the model groups in AIF system.
- Examples:
>>> input_pdf = pd.DataFrame({'MODEL_GROUP_NAME' : ["Sig Cash 1"], >>> 'ENTITY_NAME' : ["ASC"], >>> 'ATTRIBUTE_NAME' : ["ASC"], >>> 'ATTRIBUTE_VALUE' : ["ASC"], >>> 'LABEL_FILTER' : ["ASC"], >>> 'FEATURE_TYPE_FILTER' : ["ASC"] >>> }) >>> >>> supervised.add_model_groups(self, input_pdf )
- adjusted_boxplot_outliers(df=None, feature=None, tail='RIGHT')¶
It is similar to IQR method with some changes in parameters. It works well for skewed population. It customizes the range of valid data for both side of tails differently. Exponential model is used for fitting the data
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featuretail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return dataframe without outliers
- Examples:
>>> asc.adjusted_boxplot_outliers(df=input_df, feature='TRANS_BASE_AMT')
- asc_cleanup4rerun()¶
This API is used internally for cleaning up the failed or partial loaded run ids in asc_runid_lookup table and also cleans the corresponding data from asc_event_master table.
- asc_load_atl_bindings()¶
Load ATL alerts in asc_investigation_entities and also related binding data for events in asc_atl_event_bindings
- asc_runid_lookup()¶
This API is for internal use which maps the run_id with current definition Id and version.
- calculate_sample_size(sample_size_method=<function asc.hyper_geometric>, **kwargs)¶
It calculates sample size for each of the strata using hypergeometric distribution as a default method.
- Parameters
sample_size_method –
Method to get the sample size. The default method is hyper_geometric. Valid options are:
- sample_size_method=hyper_geometric
It takes a sample from hypergeometric distribution
- sample_size_method=function
The user defined method which is to be passed. Ex: sample_size_method = proportionate_sampling
The first parameter of the user defined method should always the stratified summary dataframe.
The user-defined method should always return stratified summary with new feature SAMPLE_SIZE.
kwargs –
dict of tunable parameters.
- hyper_params
It is only applicable when sample_size_method = asc.hyper_geometric
Keys of dict should be a strata number and values should be tunable parameters.
- Examplehyper_params={1{‘Pt’0.005, ‘Pe’0}, 2{‘Pt’0.005, ‘Pe’0}, 3{‘Pt’0.005, ‘Pe’0}} )
Pe: Expected interesting event rate.
Pt: Tolerable suspiciuos event rate
Pw: Power of the test. Default is 95%
Any number of parameters can be passed for user defined sample_size_method
- Returns
dataframe
Examples:
>>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric) >>> >>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric, hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}}) >>> >>> asc.calculate_sample_size(hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}}) >>> >>>#User defined method for computing sample size >>> >>> def proportionate_sampling(stratified_summary, sample_proportions=[0.45, 0.30, 0.10]): >>> stratified_summary['SAMPLE_SIZE'] = 0 >>> for idx, row in stratified_summary.iterrows(): >>> >>> if row['STRATA'] == 1: >>> stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[0]) >>> elif row['STRATA'] == 2: >>> stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[1]) >>> elif row['STRATA'] == 3: >>> stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[2]) >>> >>> return stratified_summary >>> >>> asc.calculate_sample_size(sample_size_method=proportionate_sampling, sample_proportions=[0.30, 0.20, 0.10])
- compute_initial_thresholds(features=None, outlier_method='robust_zscore', technique='percentile', outliers_proportion=0.05, robust_zscore_cut_off=3, iqr_cut_off=3, nstdev=3, anomaly_proportion=0.1, perc_list=[0.85, 0.9, 0.95], search_range=None, risk_level_priority={1: 'HR', 2: 'MR', 3: 'RR'}, tail='RIGHT')¶
This API is wrapper around outlier methods and thresholds computing techniques. It takes multiple parameters from the user and computes the thresholds for the features passed by the user.
- Parameters
features (
list
) – list of featuresoutlier_method (
str
) – name of the outlier method. Default is robust_zscoretechnique (
dataframe
) – technique to find thresholds. vaild techniques are percentile or jump. Default is percentileoutliers_proportion (
float
) – propotion of data points to be considered as outliers. Default is 0.05robust_zscore_cut_off (
int
) – cut-off for robust zscore method. Default is 3iqr_cut_off (
int
) – cut-off for iqr method. Default is 3nstdev (
int
) – cut-off for zscore method. Default is 3anomaly_proportion (
float
) – proportion of anomalies marked as an ouliers. Default is 0.10perc_list (
list
) – percentile for risk level thresholds. Default is[0.85,0.90, 0.95]
search_range (
tuple
) – It works similar like python arange method.risk_level_priority (
dict
) – risk level priorities. Ex:{1: 'HR', 2: 'MR', 3: 'RR'}
tail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return thresholds
- Examples:
>>> asc.compute_initial_thresholds(feature='TRANS_BASE_AMT', features=['TRANS_BASE_AMT','TRANS_CT'], outlier_method='robust_zscore', technique='percentile', >>> robust_zscore_cut_off=3, perc_list=[0.85,0.90,0.95])
- compute_jump_thresholds(df=None, feature=None, risk_level_priority=None, range_limit=None)¶
This API computes the thresholds for a given feature. The thresholds for Jurisdictions are computed based on the highest peaks between the range defined by the user. If range_limit is None, then thresholds for risk levels will be the highest peaks from 0.85-0.90,0.90-0.95 and 0.95-0.99 for HR, MR and RR respectively
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featurerange_limit (
tuple
) – It works similar like python arange method. Default is (85,100,0.1).risk_level_priority (
dict
) – risk level priorities. Ex:{1: 'HR', 2: 'MR', 3: 'RR'}
- Returns
return thresholds
- Examples:
>>> asc.compute_jump_thresholds(df=input_df, feature='TRANS_BASE_AMT', range_limit=(85,100,0.1))
- compute_jumps(df=None, feature=None, perc_range_values=None)¶
It is an internal API to calculate the jumps for amount and count features.
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featureperc_range_values (
list
) – values of feature at different percentile.
- Returns
return list of slopes
- Examples:
>>> asc.compute_jumps(feature='TRANS_BASE_AMT', perc_range_values=[342424,5454646,5463737,6545454])
- compute_percentile_thresholds(df=None, feature=None, perc_list=None, risk_level_priority=None)¶
This API computes the thresholds for a given feature. The thresholds for Jurisdictions are computed based on the perc_list passed by the user. If None is passed, then thresholds for risk levels will be set at 0.85,0.90,0.95 for HR, MR and RR respectively
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featureperc_list (
list
) – percentile for risk level thresholds. Default is[0.85,0.90, 0.95]
risk_level_priority (
dict
) – risk level priorities. Ex:{1: 'HR', 2: 'MR', 3: 'RR'}
- Returns
return dataframe without outliers
- Examples:
>>> asc.compute_percentile_thresholds(df=input_df, feature='TRANS_BASE_AMT', perc_list=[0.85,0.90, 0.95])
- configure_new_scenario(scenario_id=None)¶
This API will create the new scenario folder as a placeholder for the user for placing scenario focus and threshold set id notebooks. It also put an entry of new scenario in table ASC_SCENARIO_MASTER
- Parameters
scenario_id – unique id of the scenario. Ex: ‘23413145’
- Returns
API response in json format.
- Examples:
>>> aif.configure_new_scenario(scenario_id='116000052')
- create_definition(save_with_new_version=False, cleanup_results=False, version=None)¶
API creates unique definition using Model Group for a given Notebook. Internally called AIF
create_definition
method.- Parameters
save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.
cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.
- Returns
Return successful message on completion, and proper error message on failure.
- Examples:
>>> ofs_asc.create_definition( save_with_new_version = False, >>> cleanup_results = False, >>> version = None ) Definition creation successful... True
- download_csv(user_thresholds=None, threshold_set_id=None, title='Download', filename='scenario_thresholds.csv')¶
This API allows the user to download the recommended thresholds to CSV file using HTML button.
- Parameters
user_thresholds (
dic
) – key-value pair for feature and thresholdsthreshold_set_id (
str
) – threshold_set_id for which thresholds are tunedtitle (
str
) – title of HTML button
param filename: csv filename which will hold the recommended thresholds :type filename:
str
- Returns
diplay download button
- Examples:
>>> tshld={'HR_Min_Trans_Ct': 2, >>> 'HR_Min_Trans_Amt': 250088.11, >>> 'MR_Min_Trans_Ct': 2, >>> 'MR_Min_Trans_Amt': 250880.41, >>> 'RR_Min_Trans_Ct': 2, >>> 'RR_Min_Trans_Amt': 255344.37} >>> asc.download_csv(user_thresholds=tshld, threshold_set_id=11600048)
- execute_scenario(parallel=True)¶
This API executes multiple instances of scenario notebook in parallel for different fic_mis_date. The scenario data gets stored into table
fcc_am_event_details
- Parameters
parallel – if True, execute notebooks in parallel, otherwise sequential
- Returns
Display successful message.
- generate_expression(tunable_parameters=None)¶
It converts the tunable parameters passed by a user into a format accepted by the
operator_overloading
class.- Parameters
tunable_parameters (
str
) – tunable parameters passed by a user. Logical&
and|
operators should be used to create an expression. Ex: ‘Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))’- Returns
expression
- Examples:
>>> ofs_asc.generate_expression(tunable_parameters = 'Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))') 'LARGE_CASH_TRANSACTION_AMT & (LARGE_CASH_TRANSACTION_CNT | (LARGE_CASH_TRANSACTION_AMT & LARGE_CASH_TRANSACTION_CNT))'
- generate_threshold_id(base_threshold_set_id=None)¶
It calls the AML procedure which creates a new threshold set id from existing base threshold set id. The cloned threshold set id and original threshold set id mapping is stored in asc_runid_lookup table
- Parameters
base_threshold_set_id – threshold set id which need to be cloned
- Returns
new threshold set id
Returns:
- get_bar_plot(group_by='RUN_DATE', figsize=(14, 8))¶
It shows the bar plot comparison for prod alerts vs recommended alerts
- Parameters
group_by (
str
) – aggregate alerts either byRUN_DATE
orSEGMENT_ID
figsize (
tuple
) – figure size
- Returns
display bar plot
- Examples:
>>> asc.get_bar_plot(group_by='RUN_DATE')
- get_cross_segment_parameter_analysis(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)¶
Function to compare distribution of alerts and effective alerts for specified parameters across specified segments.
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludeselect_features (
list
) – list of features to be plottedfigsize (
tuple
) – size of plottitle (
str
) – title of chart
- Returns
plot
- get_cross_tab(segments_include=None, segments_exclude=None, feature_name=None, bins=None, round_digits=0)¶
Utility function to get a cross tab of parameter and effective flag
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludefeature_name (
str
) – name of the feature being analyzedbins (
numpy array or list
) – array specifying bins, values should be bucketed into; If none deciles are used as intervalsround_digits (
int
) – number specifying how intervals should be rounded. To be used when deciles are used to bin the parameter values
- Returns
dataframe
- get_data(tag='BTL', tunable_parameters=None, segments_include=None, segments_exclude=None, is_pre_prod_analysis=False)¶
It retrieves the actual data for the tunable parameters passed by the user. Depending on the tag, either BTL or ATL data will be retrieved for analysis.
- Parameters
tag (
str
) – ‘BTL’ tag for BTL analysis and ‘ATL’ tag for ATL analysis. Default is ‘BTL’tunable_parameters (
str
) – (mandatory) Parmeteres to be tuned passed using logical & and | operatorssegments_include (
list
) – segments to be includedsegments_exclude (
list
) – segments to be excludedis_pre_prod_analysis (
Boolean
) – True, if API is used for PreProd Analysis
- Returns
dataframe
- Examples:
>>> tunable_parameters='Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))' >>> ofs_asc.get_data(tunable_parameters=tunable_parameters, segments_include=['AMEA_HR','AMEA_MR'])
- get_definition_versions()¶
Get available versions for the current definition.
- Param
None
- Returns
list of available versions for current definition.
- Example:
>>> data_pdf = asc.get_definition_versions()
- get_density_plots(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)¶
Function to get density plots for multiple parameters
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludeselect_features (
list
) – List of feature names(strings) to be plottedfigsize (
tuple
) – size of plottitle (
str
) – title of chart
- Returns
plot
- get_effectiveness_trend(segments_include=None, segments_exclude=None, feature_name=None, bins=None, figsize=(14, 6), title=None, **kwargs)¶
Function to get percentage effective alerts trend plot for a feature
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludefeature_name (
str
) – feature to be analyzedbins (
numpy array or list
) – array specifying bins, values should be bucketed into; If none deciles are used as intervalsfigsize (
tuple
) – size of plottitle (
str
) – title of chart**kwargs –
Keyword arguments:
- round_digits (
int
) – number of digits to which bin limits are to be rounded
- round_digits (
- ax (
int
) – axis on which the plot is to be placed. Used only when creating multi grid plots
- ax (
- Returns
Plot
- get_effectiveness_trends(segments_include=None, segments_exclude=None, select_features=None, bins=None, figsize=(8, 6), title=None, **kwargs)¶
Function to get percentage effective alerts trend plots for multiple parameters
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludeselect_features (
list
) – List of feature names(strings) to be plottedbins (
numpy array or list
) – array specifying bins, values should be bucketed into; If none deciles are used as intervalsfigsize (
tuple
) – size of plottitle (
str
) – title of chart**kwargs –
Keyword arguments:
- round_digits (
int
) – number of digits to which bin limits are to be rounded
- round_digits (
- Returns
Plot
- get_frequency_table_1D(segments_include=None, segments_exclude=None, feature_name=None, bins=None, plot=False, figsize=(8, 6), **kwargs)¶
Function to return a frequency table and optionally convert to a heat map
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludefeature_name (
str
) – feature to be analyzedbins (
numpy array or list
) – array specifying bins, values should be bucketed into; If none deciles are used as intervalsplot (
bool
) – specifying whether to plot a heat mapfigsize (
tuple
) – size of plot**kwargs –
Keyword arguments:
- cmap (
str
) – any accepted python color map
- cmap (
- round_digits (
int
) – number of digits to which bin limits are to be rounded
- round_digits (
- ax (
int
) – axis on which to place the plot. Used only for multigrid plots
- ax (
- Returns
dataframe and Plot(optional)
- get_frequency_table_2D(segments_include=None, segments_exclude=None, select_features=None, bins=None, plot=False, figsize=(14, 8), **kwargs)¶
Function to return a frequency table and optionally convert to a heat map
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludeselect_features (
list
) – List of feature names(strings) to be plottedbins (
numpy array or list
) – array specifying bins, values should be bucketed into; If none deciles are used as intervalsplot (
bool
) – specifying whether to plot a heat mapfigsize (
tuple
) – size of plot**kwargs –
Keyword arguments:
- cmap (
str
) – any accepted python color map
- cmap (
- round_digits (
int
) – number of digits to which bin limits are to be rounded
- round_digits (
- Returns
dataframe and Plot(optional)
- get_frequency_tables_1D(segments_include=None, segments_exclude=None, select_features=None, bins=None, figsize=(8, 6), plot=True, title=None, **kwargs)¶
Function to get frequency tables and optionally heatmaps for multiple parameters
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to excludeselect_features (
list
) – features to be analyzedbins (
list of lists or arrays
) – list of arrays specifying bins, values should be bucketed into; If none deciles are used as intervalsfigsize (
tuple
) – size of plotplot (
bool
) – specifying whether to plot a heat maptitle (
str
) – title for plot**kwargs –
Keyword arguments:
- cmap (
str
) – any accepted python color map
- cmap (
- round_digits (
int
) – number of digits to which bin limits are to be rounded
- round_digits (
- Returns
list of dataframes and Plot(optional)
- get_investigation_status(oracle_ecm=True, test_ecm_conn=None)¶
Function to return the Investigation status for the Samples collected and Samples reviewed at ECM. This API connects to ECM to get the disposition status for the samples and update events dispositions in table ASC_INVESTIGATED_ENTITIES.
- Parameters
test_ecm_conn (
str
) – ECM connection string for debug purposesoracle_ecm (
str
) – Use oracle ECM to get the dispositions status for samples. Default is True
- Returns
dataframe
- get_overall_summary(segments_include=None, segments_exclude=None)¶
Function to get segment wise summary of ATL Data
- Parameters
segments_include (
list
) – list of segments to includesegments_exclude (
list
) – list of segments to exclude
- Returns
Dataframe with summary of alerts
- get_samples(strata_include=None, strata_exclude=None)¶
It takes the random samples from each strata equal to the number of sample_size calculated.
::param strata_include: strata to be included :type strata_include:
list
:param strata_exclude: strata to be excluded :type strata_exclude:list
- Returns
Returns successful message
- Examples:
>>> ofs_asc.get_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
- get_skewness_kurtosis(features_include=None, features_exclude=None, plot=True, figsize=(12, 10), sample_frac=None)¶
Function to return a data distribution for features and also shows the skewness and kurtosis.
- Parameters
features_include (
list
) – list of segments to includefeatures_exclude (
list
) – list of segments to excludeplot (
bool
) – specifying whether to plot or notfigsize (
tuple
) – size of plotsample_frac (
float
) – fraction of sample to plot
- Returns
dataframe or plot
- Examples:
>>> asc.get_skewness_kurtosis(features_include=['TOT_DEPST_AMT','TOT_DEPST_CT'],features_exclude=None, plot=True, sample_frac=0.1)
- get_threshold_names(scenario='Sig Cash', threshold_set_id=None)¶
This API loads the threshold values for tunable parameters from KDD threshold table
- Parameters
scenario – scenario name. Ex: ‘Sig Cash’
focus – focus name. Ex: ‘CUSTOMER’
threshold_set_id – (optional) threshold set id Ex: ‘116000048’
- Returns
return thresholds for tunable parameters along with threshold set ids.
- Example:
>>> data_pdf = asc.get_threshold_names(scenario='Sig Cash', threshold_set_id='116000048')
- get_threshold_set_ids(scenario='Sig Cash', threshold_set_id=None)¶
This API loads the threshold values for tunable parameters from KDD threshold table
- Parameters
scenario – scenario name. Ex: ‘Sig Cash’
focus – focus name. Ex: ‘CUSTOMER’
threshold_set_id – (optional) threshold set id Ex: ‘116000048’
- Returns
return thresholds for tunable parameters along with threshold set ids.
- Example:
>>> data_pdf = asc.get_threshold_set_ids(scenario='Sig Cash', threshold_set_id='116000048')
- hyper_geometric(N, Pt=0.005, Pe=0, Pw=0.95)¶
sample size calculation from Hyper geometric distribution
- Parameters
N – Strata size
Pt – Tolerable suspiciuos event rate
Pe – Expected interesting event rate
Pw – Power or reliability
- Returns
sample_size
- identify_atl_btl_population(oracle_ecm=True, test_ecm_conn=None)¶
This API calls a SQL procedure
p_asc_clean_atl
andp_asc_insert_atl_data
in sequence. The first procedure removes atl alerts from ASC_EVENT_MASTER table for that scenario and run dates. The second procedure insert all possible ATL alerts from KDD tables for that scenario and run dates.- Parameters
test_ecm_conn (
str
) – ECM connection string for debug purposesoracle_ecm (
str
) – Use oracle ECM to get the dispositions status for samples. Default is True
- Returns
None
- import_analysis_templates(definition_name=None, is_pre_prod_analysis=False)¶
This API will create the objectives in MMG by taking scenario folder name as an input and also imports below analysis drafts to respective objectives.
- For Prod Scenarios
BTL Analysis
ATL Analysis
ASC Scenario Execution
Impact Analysis
- For Pre-Prod Scenario
ASC Scenario Execution
Pre-Prod Analysis
- Parameters
definition_name – folder name which will be created under Home/ASC/Analysis. Ex: ‘Sig Cash 2’
is_pre_prod_analysis – Boolean Flag (True/False) - True for pre-prod analysis
- Returns
API response in json format.
- Examples:
>>> aif.import_analysis_templates(definition_name='Sig Cash 2') >>> >>> aif.import_analysis_templates(definition_name='RMF', is_pre_prod_analysis=True)
- investigate_samples(strata_include=None, strata_exclude=None)¶
It pushes the selected samples to DB table
ASC_EVENT_SAMPLE
andASC_INVESTIGATED_ENTITIES
- Parameters
strata_include (
list
) – strata to be includedstrata_exclude (
list
) – strata to be excluded
- Returns
Returns successful message
- Examples:
>>> ofs_asc.investigate_samples(strata_include = None, strata_exclude = ['AMEA_HR_0','AMEA_MR_0','AMEA_RR_0'])
- iqr_outliers(df=None, feature=None, iqr_cut_off=3, tail='RIGHT')¶
It removes outliers based on the inter-quantile-range.
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featureiqr_cut_off (
int
) – number of standard deviation. Default is 3tail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return dataframe without outliers
- Examples:
>>> asc.iqr_outliers(df=input_df, feature='TRANS_BASE_AMT', iqr_cut_off=3)
- isolation_forest_outliers(df=None, feature=None, anomaly_proportion=0.1)¶
It is based on random forest technique and identify anomalies in data. It removes the outliers by marking them
-1
and1
for non-outliers.- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featureanomaly_proportion (
float
) – proportion of outliers marked as anomalies
- Returns
return dataframe without outliers
- Examples:
>>> asc.isolation_forest_outliers(df=input_df, feature='TRANS_BASE_AMT')
- load_asc_event_master()¶
It calls a SQL procedure
p_asc_load_event_master
which loads data into ASC_EVENT_MASTER table for a list of fic_mis_date given by a class variable self.run_dates- Param
None
- Returns
status message for each fic_mis_date
- load_object(version=None)¶
Loads the object saved using
self.save_object()
- Parameters
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.
- Returns
valid python object on successful execution.
- Example:
>>> data_pdf = ofs_asc.load_object()
- percent_outliers(df=None, feature=None, outliers_proportion=0.05, tail='RIGHT')¶
It removes the outliers above the percentile cut-off for univariate.
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featureoutliers_proportion (
float
) – threshold value. Default is 5%tail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return dataframe without outliers
- Examples:
.>>> asc.percent_outliers(df=input_df, feature=’TRANS_BASE_AMT’, outliers_proportion=0.05)
- percentile_method(df_stratification=None, perc_list=None)¶
sample size calculation from Hyper geometric distribution
- Parameters
df_stratification – input dataframe
perc_list – percentile list on which data will be split
- Returns
stratified dataframe
- perform_stratification(stratification_method=<function asc.percentile_method>, **kwargs)¶
It stratifies the input population (group by JURISDICTION and RISK_LEVEL) using stratified sampling and assigns strata number to each group within chosen segment. The default method is percentile used for creating stratas where each tunable parameter is converted to percentile feature and then based on the cutt-off percentile passed by perc_list, the segments gets splited into number of stratas equal to the splited values passed in perc_list.
- Parameters
stratification_method – It takes function as an input. strata will be created on percentile method. Default method is percentile_method. It converts each tunable parameter into percentile and then pass to the expression. The expression evaluates the final percentile which is then used to split the population into different strata. - User defined method for stratification is also acceptable. Ex: stratification_method=stratification_on_amount. - In user-defined parameter, the first parameter should always be input dataframe. - User-defined method should always return STRATA as one more column with input dataframe.
Kwargs –
Keyword arguments: - perc_list :
It is only applicable when startification_method=percentile_method
List of cut-off percentiles based on which strata will be created.
Strata number will start from left to right of the list.
The order of strata will be from strata 1, strata 2, strata 3 and so on..
Strata 1 will be closer to ATL line and last strata will be at the bottom of BTL data
- Returns
successful message
- Examples:
>>> #inbuit method >>> asc.perform_stratification(stratification_method=asc.percentile_method, perc_list=[0.8,0.6,0.4,0.2]) >>> >>> #User-defined method >>> def stratification_on_amount(evented_data_pdf, split_on_amount=[200000, 100000, 50000]): >>> evented_data_pdf['STRATA'] = len(split_on_amount)+1 >>> df_filter=evented_data_pdf.copy() >>> strata_no = 1 >>> for i in split_on_amount: >>> df = df_filter[df_filter['STRATA'] == len(split_on_amount) + 1] >>> ls_event_ids = df[df['TRANS_BASE_AMT'] >= i]['EVENT_ID'].to_list() >>> df_filter.loc[df_filter['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no >>> evented_data_pdf.loc[evented_data_pdf['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no >>> strata_no = strata_no + 1 >>> return evented_data_pdf >>> >>> #Calling the user defined method >>> asc.perform_stratification(stratification_method=stratification_on_amount, split_on_amount=[200000, 100000, 50000])
- robust_zscore_outliers(df=None, feature=None, nstdev=3, tail='RIGHT')¶
It is similar like zscore with some changes in parameters. It works well for skewed population. Since mean and standard deviations are heavily influenced by outliers, instead of them it uses median and absolute deviation from median. Also called Median Absolute Deviation (MAD) method.
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featurenstdev (
int
) – number of standard deviation. Default is 3tail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return dataframe without outliers
- Examples:
>>> asc.robust_zscore_outliers(df=input_df, feature='TRANS_BASE_AMT', nstdev=2.5)
- save_object(description=None)¶
save python objects permanently. Load them whenever needed.
- Parameters
description – description for the object to be saved
- Returns
Paragrapgh execution will show success else failure.
- Example:
>>> data_pdf = data.frame({'COL1':[1,2,3], 'COL2':[4,5,6]}) >>> ofs_asc.save_object( value = data_pdf)
- scenario_post_processing()¶
It calls a method
load_asc_event_master
- Param
None
- Returns
None
- show_btl_thresholds(df=None, segment=None, strata=None)¶
It displays the stored thresholds from BTL analysis.
- Parameters
df (
dataframe
) – run dates for which alerts to be fetchedsegment (
str
) – select segment to displaystrata (
str
) – select strata number to display
- Returns
Returns BTL thresholds
- Examples:
>>> asc.show_btl_thresholds(df = df_summary, segment='AMEA_HR', strata='1')
- show_event_volume(group_by='ALL', tag=None)¶
This API displays the alerts volume for different group_by input. The table
ASC_EVENT_MASTER
will be used as an input for aggregating the data.- Parameters
group_by – input value on which alerts volume will be aggregated. It takes following inputs: - RUN_DATE : Aggregate alerts on fic_mis_date - SEGMENT_ID : Aggregate alerts on Jurisdiction and Risk_Level - EVENT_TAG : Aggregate alerts by event tag i.e. ATL or BTL - JURISDICTION : Aggregate alerts by jurisdiction - JURISDICTION_AND_RUN_DATE : Aggregate alerts by jurisdiction and run date - ALL : Aggregate alerts by all inputs together i.e by SEGMENT_ID, RUN_DATE, EVENT_TAG
tag – Filter ‘ATL’ or ‘BTL’. If None, It will show both ATL and BTL alerts.
- Returns
dataframe showing alerts volume by different inputs.
- show_execution_status()¶
It shows the execution status of the scenarios. The status can be one of these:
RUNNING
FAILED
COMPLETED
- Param
None
- Returns
dataframe
- show_initial_thresholds(features=None, segments=None)¶
It shows the computed thrsholds in dataframe and also provided an options to user for filtering thresholds based on selected segments and features
- Parameters
features (
list
) – name of the featuressegments (
list
) – name of the segments
- Returns
return dataframe
- Examples:
>>> asc.show_initial_thresholds(features=['TRANS_BASE_AMT','TRANS_CT'], segments=['AMEA'])
- show_investigation_summary()¶
This API returns the summary of percentage effective alerts for each segment and strata.
- Param
None
- Returns
dataframe
- show_prod_alerts_volume(run_dates=None, group_by='RUN_DATE')¶
This API shows the statistics for production alerts
- Parameters
run_dates (
list
) – run dates for which alerts to be fetchedgroup_by (
str
) – aggregate alerts either byRUN_DATE
orSEGMENT_ID
- Returns
Returns alerts statistics
- Examples:
>>> asc.show_prod_alerts_volume(run_dates = ['20221007','20221014','20221022'], group_by='RUN_DATE')
- show_sample_size(strata_include=None, strata_exclude=None)¶
It shows the sample size calculated for each of the segments.
- Parameters
strata_include (
list
) – strata to be includedstrata_exclude (
list
) – strata to be excluded
- Returns
dataframe showing sample size for each strata
- Examples:
>>> ofs_asc.show_sample_size(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
- show_samples(strata_include=None, strata_exclude=None)¶
It shows the selected samples for each strata.
- Parameters
strata_include (
list
) – strata to be includedstrata_exclude (
list
) – strata to be excluded
- Returns
Returns dataframe showing selected samples
- Examples:
>>> ofs_asc.show_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
- show_scenario_bindings()¶
Returns the binding names for current selected scenario which user can choose for setting up the tunable parameters.
- Param
None
- Returns
list of scenario binding’s names
- show_scenario_execution_parameters()¶
This API finds out all the possible run dates for date ranges provided by the user and display all parameters set by the user.
- Param
None
- Returns
Display parameters set by a user.
- show_scenario_master()¶
Display available scenarios and focal entities for analysis
- Param
None
- Returns
dataframe
- Examples:
>>> ofs_asc.show_scenario_master()
- show_scenarios()¶
This API shows available scenarios along with unqiue scenario Id
- Param
None
- Returns
dataframe
- Example:
>>> asc.show_scenarios()
- show_stratification_summary()¶
It displays strata number assigned to each group along with events population within each strata.
- Param
None
- Returns
dataframe showing strata number assigned to each segment
- Examples:
>>> ofs_asc.show_stratification_summary()
- update_dispositions_from_ecm(scenario_id=None, run_dates=None, test_ecm_conn=None)¶
This API connects with oracle ECM and update dispositions in ASC_INVESTIGATED_ENTITIES for a scenario and run dates. A connection with ECM must have created in CS before calling this API
- Parameters
scenario_id – valid scenario id
test_ecm_conn (
str
) – list of comma separated run dates
- Returns
successful message
- zscore_outliers(df=None, feature=None, nstdev=3, tail='RIGHT')¶
It removes the outliers which is away from the mean by some number of standard deviation.
- Parameters
df (
dataframe
) – input dataframefeature (
str
) – name of the featurenstdev (
int
) – number of standard deviation. Default is 3tail (
str
) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail
- Returns
return dataframe without outliers
- Examples:
>>> asc.zscore_outliers(df=input_df, feature='TRANS_BASE_AMT', nstdev=2.5)
- class operator_overloading(operator_value)¶
Bases:
object