ofs_aif.ofs_asc package

Submodules

ofs_aif.ofs_asc.asc module

class asc(connect_with_default_workspace=True)

Bases: aif, aif_utility

This class ofs_asc contains the python methods for automatic scenario calibration use cae

add_model_groups(model_group_name=None)

Create segmentation (model group) for AMLES

Parameters:

model_group_name – Unique name for the model group. Only alphanumeric character set including underscore, hyphen and space are allowed

Returns:

successful message on successfully creating the model groups in AIF system.

Examples:
>>> input_pdf = pd.DataFrame({'MODEL_GROUP_NAME'   : ["Sig Cash 1"],
>>>                         'ENTITY_NAME'         : ["ASC"],
>>>                         'ATTRIBUTE_NAME'      : ["ASC"],
>>>                         'ATTRIBUTE_VALUE'     : ["ASC"],
>>>                         'LABEL_FILTER'        : ["ASC"],
>>>                         'FEATURE_TYPE_FILTER' : ["ASC"]
>>>                         })
>>>
>>> supervised.add_model_groups(self, input_pdf )
adjusted_boxplot_outliers(df=None, feature=None, tail='RIGHT', n_samples=20000)

It is similar to IQR method with some changes in parameters. It works well for skewed population. It customizes the range of valid data for both side of tails differently. Exponential model is used for fitting the data

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return dataframe without outliers

Examples:
>>> asc.adjusted_boxplot_outliers(df=input_df, feature='TRANS_BASE_AMT')
asc_cleanup4rerun()

This API is used internally for cleaning up the failed or partial loaded run ids in asc_runid_lookup table and also cleans the corresponding data from asc_event_master table.

asc_runid_lookup()

This API is for internal use which maps the run_id with current definition Id and version.

calculate_sample_size(sample_size_method=<function asc.hyper_geometric>, **kwargs)

It calculates sample size for each of the strata using hypergeometric distribution as a default method.

Parameters:
  • sample_size_method

    Method to get the sample size. The default method is hyper_geometric. Valid options are:

    • sample_size_method=hyper_geometric
      • It takes a sample from hypergeometric distribution

    • sample_size_method=function
      • The user defined method which is to be passed. Ex: sample_size_method = proportionate_sampling

      • The first parameter of the user defined method should always the stratified summary dataframe.

      • The user-defined method should always return stratified summary with new feature SAMPLE_SIZE.

  • kwargs

    dict of tunable parameters.

    • hyper_params
      • It is only applicable when sample_size_method = asc.hyper_geometric

      • Keys of dict should be a strata number and values should be tunable parameters.

      • Examplehyper_params={1{‘Pt’0.005, ‘Pe’0}, 2{‘Pt’0.005, ‘Pe’0}, 3{‘Pt’0.005, ‘Pe’0}} )
        • Pe: Expected interesting event rate.

        • Pt: Tolerable suspiciuos event rate

        • Pw: Power of the test. Default is 95%

    • Any number of parameters can be passed for user defined sample_size_method

Returns:

dataframe

Examples:
>>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric)
>>>
>>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric, hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}})
>>>
>>> asc.calculate_sample_size(hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}})
>>>
>>> #User defined method for computing sample size
>>>
>>> def proportionate_sampling(stratified_summary, sample_proportions=[0.45, 0.30, 0.10]):
>>>     stratified_summary['SAMPLE_SIZE'] = 0
>>>     for idx, row in stratified_summary.iterrows():
>>>
>>>         if row['STRATA'] == 1:
>>>             stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[0])
>>>         elif row['STRATA'] == 2:
>>>             stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[1])
>>>         elif row['STRATA'] == 3:
>>>             stratified_summary.loc[idx, 'SAMPLE_SIZE'] = int(row['POPULATION'] * sample_proportions[2])
>>>
>>>     return stratified_summary
>>>
>>> asc.calculate_sample_size(sample_size_method=proportionate_sampling, sample_proportions=[0.30, 0.20, 0.10])
compute_initial_thresholds(features=None, outlier_method=None, technique='percentile', outliers_proportion=0.05, robust_zscore_cut_off=3, iqr_cut_off=3, nstdev=3, anomaly_proportion=0.1, perc_list=[0.85, 0.9, 0.95], search_range=None, risk_level_priority={1: 'HR', 2: 'MR', 3: 'RR'}, tail='RIGHT')

This API is wrapper around outlier methods and thresholds computing techniques. It takes multiple parameters from the user and computes the thresholds for the features passed by the user.

Parameters:
  • feature (list) – list of the features

  • outlier_method (str) – name of the outlier method. Default is robust_zscore

  • technique (dataframe) – technique to find thresholds. Default is percentile

  • outliers_proportion (float) – propotion of data points to be considered as outliers. Default is 0.05

  • robust_zscore_cut_off (int) – cut-off for robust zscore method. Default is 3

  • iqr_cut_off (int) – cut-off for iqr method. Default is 3

  • nstdev (int) – cut-off for zscore method. Default is 3

  • anomaly_proportion (float) – proportion of anomalies marked as an ouliers. Default is 0.10

  • perc_list (list) – percentile for risk level thresholds. Default is [0.85,0.90, 0.95]

  • search_range (tuple) – It works similar like python arange method.

  • risk_level_priority (dict) – risk level priorities. Ex: {1: 'HR', 2: 'MR', 3: 'RR'}

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return thresholds

Examples:
>>> asc.compute_initial_thresholds(feature='TRANS_BASE_AMT', features=['TRANS_BASE_AMT','TRANS_CT'], outlier_method='robust_zscore', technique='percentile',
>>>                       robust_zscore_cut_off=3, perc_list=[0.85,0.90,0.95])
compute_jump_thresholds(df=None, feature=None, risk_level_priority=None, range_limit=None)

This API computes the thresholds for a given feature. The thresholds for Jurisdictions are computed based on the highest peaks between the range defined by the user. If range_limit is None, then thresholds for risk levels will be the highest peaks from 0.85-0.90,0.90-0.95 and 0.95-0.99 for HR, MR and RR respectively

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • range_limit (tuple) – It works similar like python arange method. Default is (85,100,0.1).

  • risk_level_priority (dict) – risk level priorities. Ex: {1: 'HR', 2: 'MR', 3: 'RR'}

Returns:

return thresholds

Examples:
>>> asc.compute_jump_thresholds(df=input_df, feature='TRANS_BASE_AMT', range_limit=(85,100,0.1))
compute_jumps(df=None, feature=None, perc_range_values=None)

It is an internal API to calculate the jumps for amount and count features.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • perc_range_values (list) – values of feature at different percentile.

Returns:

return list of slope values

Examples:
>>> asc.compute_jumps(df=df, feature='TRANS_BASE_AMT', perc_range_values=[342424,5454646,5463737,6545454])
compute_percentile_thresholds(df=None, feature=None, perc_list=None, risk_level_priority=None)

This API computes the thresholds for a given feature. The thresholds for Jurisdictions are computed based on the perc_list passed by the user. If None is passed, then thresholds for risk levels will be set at 0.85,0.90,0.95 for HR, MR and RR respectively

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • perc_list (list) – percentile for risk level thresholds. Default is [0.85,0.90, 0.95]

  • risk_level_priority (dict) – risk level priorities. Ex: {1: 'HR', 2: 'MR', 3: 'RR'}

Returns:

return dataframe without outliers

Examples:
>>> asc.compute_percentile_thresholds(df=input_df, feature='TRANS_BASE_AMT', perc_list=[0.85,0.90, 0.95])
configure_new_scenario(scenario_id=None)

This API will create the new scenario folder as a placeholder for the user for placing scenario focus and threshold set id notebooks. It also put an entry of new scenario in table ASC_SCENARIO_MASTER

Parameters:

scenario_id – unique id of the scenario. Ex: ‘23413145’

Returns:

API response in json format.

Examples:
>>> aif.configure_new_scenario(scenario_id='116000052')
create_definition(save_with_new_version=False, cleanup_results=False, version=None)

API creates unique definition using Model Group for a given Notebook. Internally called AIF create_definition method.

Parameters:
  • save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.

  • cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.

  • version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.

Returns:

Return successful message on completion, and proper error message on failure.

Examples:
>>> ofs_asc.create_definition( save_with_new_version = False,
>>>                        cleanup_results = False,
>>>                        version = None )
Definition creation successful...
True
delete_exec_status_from_db(run_date=None, run_flag=True)
This API delete entry from table asc_scenario_execution_status for given run_date

This API is for internal use and being called inside API execute_scenario This API is used internally with asc.run_flag=True

Parameters:
  • run_date – run date of a given scenario

  • run_flag – rerun/update flag

Returns:

None

download_csv(user_thresholds=None, threshold_set_id=None, title='Download', filename='scenario_thresholds.csv')

This API allows the user to download the recommended thresholds to CSV file using HTML button.

Parameters:
  • user_thresholds (dic) – key-value pair for feature and thresholds

  • threshold_set_id (str) – threshold_set_id for which thresholds are tuned

  • title (str) – title of HTML button

param filename: csv filename which will hold the recommended thresholds :type filename: str

Returns:

diplay download button

Examples:
>>> tshld={'HR_Min_Trans_Ct': 2,
>>>     'HR_Min_Trans_Amt': 250088.11,
>>>     'MR_Min_Trans_Ct': 2,
>>>     'MR_Min_Trans_Amt': 250880.41,
>>>     'RR_Min_Trans_Ct': 2,
>>>     'RR_Min_Trans_Amt': 255344.37}
>>> asc.download_csv(user_thresholds=tshld, threshold_set_id=11600048)
execute_scenario(parallel=True)

This API executes multiple instances of scenario notebook in parallel for different fic_mis_date. The scenario data gets stored into table fcc_am_event_details

Parameters:

parallel – if True, execute notebooks in parallel across JOB_IDs and sequential across RUN Dates.

Returns:

Display successful message.

export_import_threshold_notebook(scenario_id=None, objective_dict=None, debug=True)

This API will export threshold notebook from AMLScenarios-Converted folder and import it to the scenario definition folder with other templates.

Parameters:
  • scenario_id – id of the scenario. Ex: 116000031

  • objective_dict – hierarichal folders strcuture in dict format

  • debug – It should be True for debugging purposes. It also displays the objective_id and zip filename used for scenario imported.

Returns:

API response flag

Examples:
>>> objective_dict = {
>>> "1": {"name": "ASC",
>>> "desc": "Automatic Scenario Calibration"},
>>> "2": {"name": "Analysis",
>>> "desc": "Prod and PreProd Analysis"},
>>> "3": {"name": "Sig Cash Prod",
>>> "desc": "sig cash scenario for production analysis"}}
>>>
>>> asc.export_import_threshold_notebook(scenario_id=116000031, objective_dict=objective_dict)
>>>
generate_expression(tunable_parameters=None)

It converts the tunable parameters passed by a user into a format accepted by the operator_overloading class.

Parameters:

tunable_parameters (str) – tunable parameters passed by a user. Logical & and | operators should be used to create an expression. Ex: ‘Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))’

Returns:

expression

Examples:
>>> ofs_asc.generate_expression(tunable_parameters = 'Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))')
'LARGE_CASH_TRANSACTION_AMT & (LARGE_CASH_TRANSACTION_CNT | (LARGE_CASH_TRANSACTION_AMT & LARGE_CASH_TRANSACTION_CNT))'
generate_threshold_id(base_threshold_set_id=None)

It calls the AML procedure which creates a new threshold set id from existing base threshold set id. The cloned threshold set id and original threshold set id mapping is stored in asc_runid_lookup table

Parameters:

base_threshold_set_id – threshold set id which need to be cloned

Returns:

new threshold set id

get_bar_plot(group_by='RUN_DATE', figsize=(14, 8))

It shows the bar plot comparison for prod alerts vs recommended alerts

Parameters:
  • group_by (str) – aggregate alerts either by RUN_DATE or SEGMENT_ID

  • figsize (tuple) – figure size

Returns:

display bar plot

Examples:
>>> asc.get_bar_plot(group_by='RUN_DATE')
get_cross_segment_parameter_analysis(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)

Function to compare distribution of alerts and effective alerts for specified parameters across specified segments.

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • select_features (list) – list of features to be plotted

  • figsize (tuple) – size of plot

  • title (str) – title of chart

Returns:

plot

get_cross_tab(segments_include=None, segments_exclude=None, feature_name=None, bins=None, round_digits=0)

Utility function to get a cross tab of parameter and effective flag

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • feature_name (str) – name of the feature being analyzed

  • bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals

  • round_digits (int) – number specifying how intervals should be rounded. To be used when deciles are used to bin the parameter values

Returns:

dataframe

get_data(tag='BTL', tunable_parameters=None, segments_include=None, segments_exclude=None, is_pre_prod_analysis=False, n_samples=1000)

It retrieves the actual data for the tunable parameters passed by the user. Depending on the tag, either BTL or ATL data will be retrieved for analysis.

Parameters:
  • tag (str) – ‘BTL’ tag for BTL analysis and ‘ATL’ tag for ATL analysis. Default is ‘BTL’

  • tunable_parameters (str) – (mandatory) Parmeteres to be tuned passed using logical & and | operators

  • segments_include (list) – segments to be included

  • segments_exclude (list) – segments to be excluded

  • is_pre_prod_analysis (bool) – True for preprod analysis

  • n_samples (int) – number of samples for finding out parameters datatypes. It is suggested to use when volume is huge, so dataypes could be tested on few records.

Returns:

dataframe

Examples:
>>> tunable_parameters='Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))'
>>> ofs_asc.get_data(tunable_parameters=tunable_parameters, segments_include=['AMEA_HR','AMEA_MR'])
get_definition_mapped_scenario()

get scenario name from asc_definition_master table for which the definition was created

Param:

None

Returns:

scenario name

Examples:
>>> asc.get_definition_mapped_scenario()
get_definition_versions(is_scenario_execution=False)

Get available versions for the current definition.

Param:

None

Returns:

list of available versions for current definition.

Example:
>>> data_pdf = asc.get_definition_versions()
get_density_plots(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)

Function to get density plots for multiple parameters

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • select_features (list) – List of feature names(strings) to be plotted

  • figsize (tuple) – size of plot

  • title (str) – title of chart

Returns:

plot

get_effectiveness_trend(segments_include=None, segments_exclude=None, feature_name=None, bins=None, figsize=(14, 6), title=None, **kwargs)

Function to get percentage effective alerts trend plot for a parameter

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • feature_name (str) – feature to be analyzed

  • bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals

  • figsize (tuple) – size of plot

  • title (str) – title of chart

  • **kwargs

    Keyword arguments:

    • round_digits (int) –

      number of digits to which bin limits are to be rounded

    • ax (int) –

      axis on which the plot is to be placed. Used only when creating multi grid plots

Returns:

Plot

Function to get percentage effective alerts trend plots for multiple parameters

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • select_features (list) – List of feature names(strings) to be plotted

  • bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals

  • figsize (tuple) – size of plot

  • title (str) – title of chart

  • **kwargs

    Keyword arguments:

    • round_digits (int) –

      number of digits to which bin limits are to be rounded

Returns:

Plot

get_event_trxns(trxn_type=None, trxn_identifier=None, temp_table_name=None)

This API retrieves the raw transactions for the selected transaction type and transaction identifier. User can pass multiple trxn types and pass trxn identifer for which the transactions dataframe will be retrieved. For each transaction type, a separate dataframe will be retrieved.

Parameters:
  • trxn_type (list or str) – str or list of trxn types. Valid trxn types are ‘MI’,’CASH’,’WIRE’,’BACK_OFFICE’,’FOCAL_ENTITY’

  • trxn_identifier (str) – set of event ids for which transaction data is required. Possible inputs are ‘Threshold Set 1’, ‘Threshold Set 2’, ‘Intersection’, ‘Union’

  • temp_table_name (str) – intermediate table name which will store the sql results. If None, It will be created internally.

Returns:

dict of dataframes with key as trxn_type

Examples:
>>> asc.get_event_trxns_all(trxn_type='CASH', trxn_identifier='Intersection')
>>>
>>> asc.get_event_trxns_all(trxn_type=['CASH','MI'], trxn_identifier='Threshold Set 1')
output:
{'CASH_TRXN' : df1, 'MI_TRXN' : df2}
get_event_trxns_all(trxn_type=None, trxn_identifier=None, parallel=False)

This API retrieves the raw transactions for the selected transaction type and transaction identifier. User can pass multiple trxn types and pass trxn identifer for which the transactions dataframe will be retrieved. For each transaction type, a separate dataframe will be retrieved.

Parameters:
  • trxn_type (list or str) – str or list of trxn types. Valid trxn types are ‘MI’,’CASH’,’WIRE’,’BACK_OFFICE’,’FOCAL_ENTITY’

  • trxn_identifier (str) – set of event ids for which transaction data is required. Possible inputs are ‘Threshold Set 1’, ‘Threshold Set 2’, ‘Intersection’, ‘Union’

  • parallel (bool) – Use of multithreading for multiple values of trxn_identifier. Default is False.

Returns:

dict of dataframes with key as trxn_type

Examples:
>>> asc.get_event_trxns_all(trxn_type='CASH', trxn_identifier='Intersection')
output:
{'CASH_TRXN_INTERSECTION' : df1, 'MI_TRXN_TRANSACTION' : df2}
>>> asc.get_event_trxns_all(trxn_type=['CASH','MI'], trxn_identifier=['Threshold Set 1 only','Threshold Set 2 only'])
output:
{'CASH_TRXN_THRESHOLD SET 1 ONLY' : df1, 'MI_TRXN_THRESHOLD SET 1 ONLY' : df2, 'CASH_TRXN_THRESHOLD SET 2 ONLY' : df3, 'MI_TRXN_THRESHOLD SET 2 ONLY' : df4}
get_frequency_table_1D(segments_include=None, segments_exclude=None, feature_name=None, bins=None, plot=False, figsize=(8, 6), **kwargs)

Function to return a frequency table and optionally convert to a heat map

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • feature_name (str) – feature to be analyzed

  • bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals

  • plot (bool) – specifying whether to plot a heat map

  • figsize (tuple) – size of plot

  • **kwargs

    Keyword arguments:

    • cmap (str) –

      any accepted python color map

    • round_digits (int) –

      number of digits to which bin limits are to be rounded

    • ax (int) –

      axis on which to place the plot. Used only for multigrid plots

Returns:

dataframe and Plot(optional)

get_frequency_table_2D(segments_include=None, segments_exclude=None, select_features=None, bins=None, plot=False, figsize=(14, 8), **kwargs)

Function to return a frequency table and optionally convert to a heat map

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • select_features (list) – List of feature names(strings) to be plotted

  • bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals

  • plot (bool) – specifying whether to plot a heat map

  • figsize (tuple) – size of plot

  • **kwargs

    Keyword arguments:

    • cmap (str) –

      any accepted python color map

    • round_digits (int) –

      number of digits to which bin limits are to be rounded

Returns:

dataframe and Plot(optional)

get_frequency_tables_1D(segments_include=None, segments_exclude=None, select_features=None, bins=None, figsize=(8, 6), plot=True, title=None, **kwargs)

Function to get frequency tables and optionally heatmaps for multiple parameters

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

  • select_features (list) – features to be analyzed

  • bins (list of lists or arrays) – list of arrays specifying bins, values should be bucketed into; If none deciles are used as intervals

  • figsize (tuple) – size of plot

  • plot (bool) – specifying whether to plot a heat map

  • title (str) – title for plot

  • **kwargs

    Keyword arguments:

    • cmap (str) –

      any accepted python color map

    • round_digits (int) –

      number of digits to which bin limits are to be rounded

Returns:

list of dataframes and Plot(optional)

get_investigation_status(oracle_ecm=True, test_ecm_conn=None)

Function to return the Investigation status for the Samples collected and Samples reviewed at ECM. This API connects to ECM to get the disposition status for the samples and update events dispositions in table ASC_INVESTIGATED_ENTITIES.

Parameters:
  • test_ecm_conn (str) – ECM connection string for debug purposes

  • oracle_ecm (str) – Use oracle ECM to get the dispositions status for samples. Default is True

Returns:

dataframe

get_overall_summary(segments_include=None, segments_exclude=None)

Function to get segment wise summary of ATL Data

Parameters:
  • segments_include (list) – list of segments to include

  • segments_exclude (list) – list of segments to exclude

Returns:

Dataframe with summary of alerts

get_samples(strata_include=None, strata_exclude=None)

It takes the random samples from each strata equal to the number of sample_size calculated.

::param strata_include: strata to be included :type strata_include: list :param strata_exclude: strata to be excluded :type strata_exclude: list

Returns:

Returns successful message

Examples:
>>> ofs_asc.get_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
get_scenario_job_ids(scenario_id=None)

This API returns the available JOB_IDs in KDD_JOB table for a given scenario

Parameters:

scenario_id – scenario id. Ex: ‘116000031’

Returns:

list of scenarios job ids

Example:
>>> get_scenario_job_ids(scenario_id='116000031')
get_skewness_kurtosis(features_include=None, features_exclude=None, plot=True, figsize=(12, 10), sample_frac=None)

Function to return a data distribution for features and also shows the skewness and kurtosis.

Parameters:
  • features_include (list) – list of segments to include

  • features_exclude (list) – list of segments to exclude

  • plot (bool) – specifying whether to plot or not

  • figsize (tuple) – size of plot

  • sample_frac (float) – fraction of sample to plot

get_threshold_names(scenario_id='116000031', threshold_set_id=None)

This API loads the threshold names from KDD* table

Parameters:
  • scenario_id – scenario id. Ex: ‘116000031’

  • threshold_set_id – (optional) threshold set id Ex: ‘116000048’

Returns:

return dataframe

Example:
>>> data_pdf = asc.get_threshold_names(scenario='116000031', threshold_set_id='116000048')
get_threshold_set_ids(scenario_id=None, threshold_set_id=None)

This API loads the threshold values for tunable parameters from KDD threshold table

Parameters:
  • scenario_id – scenario id. Ex: ‘116000031’

  • threshold_set_id – (optional) threshold set id Ex: ‘116000048’

Returns:

return thresholds for tunable parameters along with threshold set ids.

Example:
>>> data_pdf = asc.get_threshold_set_ids(scenario_id='116000031', threshold_set_id='116000048')
get_trxns_summary(versions_list=None, summary_of=None)

It shows the transaction’s summary by transaction type for versions passed by the user.

Parameters:
  • versions_list (list) – list of versions for which trxn summary to be retrieved

  • summary_of (str) – get trxn summary of either Transaction or focal_entity. Default is Transaction

Returns:

dict of dataframes for different trxn types

Examples:
>>> ofs_asc.get_trxns_summary(versions_list=[0,3], summary_of='TRANSACTION'])
>>>
>>> ofs_asc.get_trxns_summary(versions_list=[0,3], summary_of='FOCAL_ENTITY'])
Output :
Transactions summary is successfully fetched.....
Fetched data for trans_type ['CASH_TRXN']
hyper_geometric(N, Pt=0.005, Pe=0, Pw=0.95)

sample size calculation from Hyper geometric distribution

Parameters:
  • N – Strata size

  • Pt – Tolerable suspiciuos event rate

  • Pe – Expected interesting event rate

  • Pw – Power or reliability

Returns:

sample_size

identify_atl_btl_population()

This API calls a SQL procedure p_asc_insert_atl_data. It inserts all ATL alerts from KDD tables for given scenario and run dates.

Param:

None

Returns:

None

import_analysis_templates(scenario=None, definition_name=None, analysis='prod', debug=False)

This API will create the objectives in MMG by taking scenario folder name as an input and also imports below analysis drafts to respective objectives.

  • For Prod Scenarios
    • BTL Analysis

    • ATL Analysis

    • ASC Scenario Execution

    • Impact Analysis

  • For Pre-Prod Scenario
    • ASC Scenario Execution

    • Pre-Prod Analysis

Parameters:
  • scenario – name of the scenario

  • definition_name – folder name which will be created under Home/ASC/Analysis. Ex: ‘Sig Cash 2’

  • analysis – type of analysis whether prod or preprod. Default is prod

Debug:

It should be True for debugging purposes. it also displays objective_id and zip filename used for scenario imported.

Returns:

API response in json format.

Examples:
>>> aif.import_analysis_templates(definition_name='Sig Cash 2')
>>>
>>> aif.import_analysis_templates(definition_name='RMF', analysis='preprod')
insert_exec_status_to_db(execution_status_df=None, threshold_set_id=None, job_id=None)
This API insert entry for each scenario being executed through execute_scenario API to track the triggered scenarios information

into table asc_scenario_execution_status This API is for internal use and being called inside API execute_scenario

Parameters:
  • execution_status_df – dataframe contains notebook instance related information like Run Dates, scenario_code, objectiveId, modelid, notebookid, instanceid, model_status, api_status

  • threshold_set_id – Threshold set id for which scenario gets executed.

  • job_id – job id for which scenario gets executed.

Returns:

None

investigate_samples(strata_include=None, strata_exclude=None)

It pushes the selected samples to DB table ASC_EVENT_SAMPLE and ASC_INVESTIGATED_ENTITIES

Parameters:
  • strata_include (list) – strata to be included

  • strata_exclude (list) – strata to be excluded

Returns:

Returns successful message

Examples:
>>> ofs_asc.investigate_samples(strata_include = None, strata_exclude = ['AMEA_HR_0','AMEA_MR_0','AMEA_RR_0'])
iqr_outliers(df=None, feature=None, iqr_cut_off=3, tail='RIGHT')

It removes outliers based on the inter-quantile-range.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • iqr_cut_off (int) – number of standard deviation. Default is 3

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return dataframe without outliers

Examples:
>>> asc.iqr_outliers(df=input_df, feature='TRANS_BASE_AMT', iqr_cut_off=3)
isolation_forest_outliers(df=None, feature=None, anomaly_proportion=0.1)

It is based on random forest technique and identify anomalies in data. It removes the outliers by marking them -1 and 1 for non-outliers.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • anomaly_proportion (float) – proportion of outliers marked as anomalies

Returns:

return dataframe without outliers

Examples:
>>> asc.isolation_forest_outliers(df=input_df, feature='TRANS_BASE_AMT')
load_asc_event_master()

It calls a SQL procedure p_asc_load_event_master which loads data into ASC_EVENT_MASTER table for a list of fic_mis_date given by a class variable self.run_dates

Param:

None

Returns:

status message for each fic_mis_date

load_asc_investigated_entities()

Load ATL alerts in asc_investigation_entities

load_object(version=None)

Loads the object saved using self.save_object()

Parameters:

version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.

Returns:

valid python object on successful execution.

Example:
>>> data_pdf = ofs_asc.load_object()
percent_outliers(df=None, feature=None, outliers_proportion=0.05, tail='RIGHT')

It removes the outliers above the percentile cut-off for univariate.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • outliers_proportion (float) – threshold value. Default is 5%

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return dataframe without outliers

Examples:
>>> asc.percent_outliers(df=input_df, feature='TRANS_BASE_AMT', outliers_proportion=0.05)
percentile_method(df_stratification=None, perc_list=None)

sample size calculation from Hyper geometric distribution

Parameters:
  • df_stratification – input dataframe

  • perc_list – percentile list on which data will be split

Returns:

stratified dataframe

perform_stratification(stratification_method=<function asc.percentile_method>, **kwargs)

It stratifies the input population (group by JURISDICTION and RISK_LEVEL) using stratified sampling and assigns strata number to each group within chosen segment. The default method is percentile used for creating stratas where each tunable parameter is converted to percentile feature and then based on the cutt-off percentile passed by perc_list, the segments gets splited into number of stratas equal to the splited values passed in perc_list.

Parameters:
  • stratification_method – It takes function as an input. strata will be created on percentile method. Default method is percentile_method. It converts each tunable parameter into percentile and then pass to the expression. The expression evaluates the final percentile which is then used to split the population into different strata. - User defined method for stratification is also acceptable. Ex: stratification_method=stratification_on_amount. - In user-defined parameter, the first parameter should always be input dataframe. - User-defined method should always return STRATA as one more column with input dataframe.

  • Kwargs

    Keyword arguments: - perc_list :

    • It is only applicable when startification_method=percentile_method

    • List of cut-off percentiles based on which strata will be created.

    • Strata number will start from left to right of the list.

    • The order of strata will be from strata 1, strata 2, strata 3 and so on..

    • Strata 1 will be closer to ATL line and last strata will be at the bottom of BTL data

Returns:

successful message

Examples:
>>> #inbuit method
>>> asc.perform_stratification(stratification_method=asc.percentile_method, perc_list=[0.8,0.6,0.4,0.2])
>>>
>>> #User-defined method
>>> def stratification_on_amount(evented_data_pdf, split_on_amount=[200000, 100000, 50000]):
>>>     evented_data_pdf['STRATA'] = len(split_on_amount)+1
>>>     df_filter=evented_data_pdf.copy()
>>>     strata_no = 1
>>>     for i in split_on_amount:
>>>         df = df_filter[df_filter['STRATA'] == len(split_on_amount) + 1]
>>>         ls_event_ids = df[df['TRANS_BASE_AMT'] >= i]['EVENT_ID'].to_list()
>>>         df_filter.loc[df_filter['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no
>>>         evented_data_pdf.loc[evented_data_pdf['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no
>>>         strata_no = strata_no + 1
>>> return evented_data_pdf
>>>
>>> #Calling the user defined method
>>> asc.perform_stratification(stratification_method=stratification_on_amount, split_on_amount=[200000, 100000, 50000])
robust_zscore_outliers(df=None, feature=None, nstdev=3, tail='RIGHT')

It is similar like zscore with some changes in parameters. It works well for skewed population. Since mean and standard deviations are heavily influenced by outliers, instead of them it uses median and absolute deviation from median. Also called Median Absolute Deviation (MAD) method.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • nstdev (int) – number of standard deviation. Default is 3

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return dataframe without outliers

Examples:
>>> asc.robust_zscore_outliers(df=input_df, feature='TRANS_BASE_AMT', nstdev=2.5)
save_object(description=None)

save python objects permanently. Load them whenever needed.

Parameters:

description – description for the object to be saved

Returns:

Paragrapgh execution will show success else failure.

Example:
>>> data_pdf = data.frame({'COL1':[1,2,3], 'COL2':[4,5,6]})
>>> ofs_asc.save_object( value = data_pdf)
scenario_post_processing()
It calls multiple methods in sequence:
  • asc_runid_lookup() : insert run_id and map it to defintion id and version in ASC_RUNID_LOOKUP table

  • load_asc_investigated_entities : load ATL alerts for given run_id in ASC_INVESTIGATED_ENTITIES

  • load_asc_event_master() : load the events into ASC_EVENT_MASTER table for given run_id

  • identify_atl_btl_population() : Segregate ATL alerts from BTL in ASC_EVENT_MASTER table

Param:

None

Returns:

None

show_btl_thresholds(df=None, segment=None, strata=None)

It displays the stored thresholds from BTL analysis.

Parameters:
  • df (dataframe) – run dates for which alerts to be fetched

  • segment (str) – select segment to display

  • strata (str) – select strata number to display

Returns:

Returns BTL thresholds

Examples:
>>> asc.show_btl_thresholds(df = df_summary, segment='AMEA_HR', strata='1')
show_event_volume(group_by='ALL', tag=None)

This API displays the alerts volume for different group_by input. The table ASC_EVENT_MASTER will be used as an input for aggregating the data.

Parameters:
  • group_by – input value on which alerts volume will be aggregated. It takes following inputs: - RUN_DATE : Aggregate alerts on fic_mis_date - SEGMENT_ID : Aggregate alerts on Jurisdiction and Risk_Level - EVENT_TAG : Aggregate alerts by event tag i.e. ATL or BTL - JURISDICTION : Aggregate alerts by jurisdiction - JURISDICTION_AND_RUN_DATE : Aggregate alerts by jurisdiction and run date - ALL : Aggregate alerts by all inputs together i.e by SEGMENT_ID, RUN_DATE, EVENT_TAG

  • tag – Filter ‘ATL’ or ‘BTL’. If None, It will show both ATL and BTL alerts.

Returns:

dataframe showing alerts volume by different inputs.

show_event_volume_comparison(versions_list=None, group_by='RUN_DATE', tag=None)

This API is a wrapper around show_event_volume by taking an additional parameter versions_list which must be supplied by user for comparison of alerts volume

Parameters:
  • versions_list – list of versions

  • group_by – input value on which alerts volume will be aggregated. It takes following inputs: - RUN_DATE : Aggregate alerts on fic_mis_date. Default - SEGMENT_ID : Aggregate alerts on Jurisdiction and Risk_Level - EVENT_TAG : Aggregate alerts by event tag i.e. ATL or BTL - JURISDICTION : Aggregate alerts by jurisdiction - JURISDICTION_AND_RUN_DATE : Aggregate alerts by jurisdiction and run date - ALL : Aggregate alerts by all inputs together i.e. by SEGMENT_ID, RUN_DATE, EVENT_TAG

  • tag – Filter ‘ATL’ or ‘BTL’. If None, It will show both ATL and BTL alerts.

Returns:

list of dataframes for supplied versions

Examples:
>>>
>>> asc.show_event_volume_comparison(versions_list=[0,3], group_by='RUN_DATE')
>>>
show_execution_status(debug=False, show_status='current')

It shows the execution status of the scenarios. The status can be one of these:

  • RUNNING

  • FAILED

  • COMPLETED

Parameters:
  • debug – True for debugging the error for particular job_id

  • show_status – Show scenario’s status for current executions or scenario’s status for current and all previous executions for given definition id and version.

Returns:

dataframe

show_initial_thresholds(features=None, segments=None)

It shows the computed thrsholds in dataframe and also provided an options to user for filtering thresholds based on selected segments and features

Parameters:
  • features (list) – name of the features

  • segments (list) – name of the segments

Returns:

return dataframe

Examples:
>>> asc.show_initial_thresholds(features=['TRANS_BASE_AMT','TRANS_CT'], segments=['AMEA'])
show_investigation_summary()

This API returns the summary of percentage effective alerts for each segment and strata.

Param:

None

Returns:

dataframe

show_prod_alerts_volume(run_dates=None, group_by='RUN_DATE')

This API shows the statistics for production alerts

Parameters:
  • run_dates (list) – run dates for which alerts to be fetched

  • group_by (str) – aggregate alerts either by RUN_DATE or SEGMENT_ID

Returns:

Returns alerts statistics

Examples:
>>> asc.show_prod_alerts_volume(run_dates = ['20221007','20221014','20221022'], group_by='RUN_DATE')
show_sample_size(strata_include=None, strata_exclude=None)

It shows the sample size calculated for each of the segments.

Parameters:
  • strata_include (list) – strata to be included

  • strata_exclude (list) – strata to be excluded

Returns:

dataframe showing sample size for each strata

Examples:
>>> ofs_asc.show_sample_size(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
show_samples(strata_include=None, strata_exclude=None)

It shows the selected samples for each strata.

Parameters:
  • strata_include (list) – strata to be included

  • strata_exclude (list) – strata to be excluded

Returns:

Returns dataframe showing selected samples

Examples:
>>> ofs_asc.show_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)
show_scenario_bindings()

Returns the binding names for current selected scenario which user can choose for setting up the tunable parameters.

Param:

None

Returns:

list of scenario binding’s names

show_scenario_execution_parameters()

This API finds out all the possible run dates for date ranges provided by the user and display all parameters set by the user.

Param:

None

Returns:

Display parameters set by a user.

show_scenarios()

This API shows available scenarios along with unqiue scenario Id

Param:

None

Returns:

return dataframe

Example:
>>> asc.show_scenarios()
show_stratification_summary()

It displays strata number assigned to each group along with events population within each strata.

Param:

None

Returns:

dataframe showing strata number assigned to each segment

Examples:
>>> ofs_asc.show_stratification_summary()
show_venn_diagram(trxn_type=None, figsize=(16, 10), title=None)

It displays the venn diagram for transaction summary for different transaction types and different threshold set Ids. User can view the common thransactions between the different threshold set Ids

Parameters:
  • trxn_type (list or str) – str or list of trxn types. Valid trxn types are ‘MI’,’CASH’,’WIRE’,’BACK_OFFICE’

  • figsize (tuple) – size of the figure. Default is (16,10)

  • title (str) – title to the figure. If None, title will be auto generated based on aggregate type was TRANSACTION or FOCUS

Returns:

venn diagram for trxn types

Examples:
>>> asc.show_venn_diagram(trxn_type='CASH', figsize=(16,10))
>>>
>>> asc.show_venn_diagram(trxn_type=['CASH','MI'], title='Transactions overview')
show_version_summary()

Show available versions and threshold set ids for given definiton name

Param:

None

Returns:

version details along with threshold set id

Example:
>>> data_pdf = asc.show_version_summary()
update_dispositions_from_ecm(scenario_id=None, run_dates=None, oracle_ecm=True, test_ecm_conn=None)

This API connects with oracle ECM and update dispositions in ASC_INVESTIGATED_ENTITIES for a scenario and run dates. A connection with ECM with name ASC_ECM must have created in CS before calling this API

Parameters:
  • scenario_id – valid scenario id

  • test_ecm_conn (str) – list of comma separated run dates

  • oracle_ecm (bool) – True for Oracle ECM. Dispositions are to be updated either manually or by some other way for third party ECM if flag is False.

Returns:

successful message

update_kdd_cal(script_type='start', ficmisdate=None, run_batch_name='DLY')
zscore_outliers(df=None, feature=None, nstdev=3, tail='RIGHT')

It removes the outliers which is away from the mean by some number of standard deviation.

Parameters:
  • df (dataframe) – input dataframe

  • feature (str) – name of the feature

  • nstdev (int) – number of standard deviation. Default is 3

  • tail (str) – tail to use for identifying outliers. Valid inputs are LEFT,`RIGHT` or BOTH. Default is RIGHT tail

Returns:

return dataframe without outliers

Examples:
>>> asc.zscore_outliers(df=input_df, feature='TRANS_BASE_AMT', nstdev=2.5)
class operator_overloading(operator_value)

Bases: object

Module contents