ofs_aif.ofs_asc package¶

Submodules¶

ofs_aif.ofs_asc.asc module¶

class asc(connect_with_default_workspace=True)¶

Bases: ofs_aif.aif.aif, ofs_aif.aif_utility.aif_utility

This class ofs_asc contains the python methods for automatic scenario calibration use cae

add_model_groups(model_group_name=None)¶

Create segmentation (model group) for AMLES

Parameters: model_group_name – Unique name for the model group. Only alphanumeric character set including underscore, hyphen and space are allowed
Returns: successful message on successfully creating the model groups in AIF system.

Examples:

>>> input_pdf = pd.DataFrame({'MODEL_GROUP_NAME'   : ["Sig Cash 1"],
>>>                         'ENTITY_NAME'         : ["ASC"],
>>>                         'ATTRIBUTE_NAME'      : ["ASC"],
>>>                         'ATTRIBUTE_VALUE'     : ["ASC"],
>>>                         'LABEL_FILTER'        : ["ASC"],
>>>                         'FEATURE_TYPE_FILTER' : ["ASC"]
>>>                         })
>>>
>>> supervised.add_model_groups(self, input_pdf )

asc_cleanup4rerun()¶

asc_load_atl_bindings()¶: Load ATL alerts in asc_investigation_entities and also related binding data for events in asc_atl_event_bindings

asc_runid_lookup()¶

calculate_sample_size(sample_size_method=<function asc.hyper_geometric>, **kwargs)¶

It calculates sample size for each of the strata using hypergeometric distribution as a default method.

Parameters

sample_size_method –
Method to get the sample size. The default method is hyper_geometric. Valid options are:
- sample_size_method=hyper_geometric
  - It takes a sample from hypergeometric distribution
- sample_size_method=function
  - The user defined method which is to be passed. Ex: sample_size_method = proportionate_sampling
  - The first parameter of the user defined method should always the strata population.
  - The user-defined method should always return sample number as an output.
kwargs –
dict of tunable parameters.
- hyper_params
  - It is only applicable when sample_size_method = asc.hyper_geometric
  - Keys of dict should be a strata number and values should be tunable parameters.
  - Examplehyper_params={1{‘Pt’0.005, ‘Pe’0}, 2{‘Pt’0.005, ‘Pe’0}, 3{‘Pt’0.005, ‘Pe’0}} )
    
    Pe: Expected interesting event rate.
    
    Pt: Tolerable suspiciuos event rate
    
    Pw: Power of the test. Default is 95%
- Any number of parameters can be passed for user defined sample_size_method

Returns

dataframe

Examples:

>>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric)
>>>
>>> asc.calculate_sample_size(sample_size_method=asc.hyper_geometric, hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}})
>>>
>>> asc.calculate_sample_size(hyper_params={1 : {'Pt' : 0.005, 'Pe' : 0}, 2 : {'Pt' : 0.005, 'Pe' : 0}, 3 : {'Pt' : 0.005, 'Pe' : 0}})
>>>
>>>#User defined method for computing sample size
>>>
>>> def proportionate_sampling(strata_population, sample_proportions=[0.45, 0.30, 0.10]):
>>>     for i in range(len(stratify_proportions)):
>>>     sample_size = strata_population * stratify_proportions[i]
>>>     return sample_size
>>>
>>> asc.calculate_sample_size(sample_size_method=proportionate_sampling, sample_proportions=[0.30, 0.20, 0.10])

create_definition(save_with_new_version=False, cleanup_results=False, version=None)¶

API creates unique definition using Model Group for a given Notebook. Internally called AIF create_definition method.

Parameters

save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.
cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.

Returns

Return successful message on completion, and proper error message on failure.

Examples:

>>> ofs_asc.create_definition( save_with_new_version = False,
>>>                        cleanup_results = False,
>>>                        version = None )
Definition creation successful...
True

download_csv(user_thresholds=None, threshold_set_id=None, title='Download', filename='scenario_thresholds.csv')¶

execute_scenario(parallel=True)¶

This API executes multiple instances of scenario notebook in parallel for different fic_mis_date. The scenario data gets stored into table fcc_am_event_details

Parameters: parallel – if True, execute notebooks in parallel, otherwise sequential
Returns: Display successful message.

generate_expression(tunable_parameters=None)¶

It converts the tunable parameters passed by a user into a format accepted by the operator_overloading class.

Parameters: tunable_parameters (str) – tunable parameters passed by a user. Logical & and | operators should be used to create an expression. Ex: ‘Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))’
Returns: expression

Examples:

>>> ofs_asc.generate_expression(tunable_parameters = 'Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))')
'LARGE_CASH_TRANSACTION_AMT & (LARGE_CASH_TRANSACTION_CNT | (LARGE_CASH_TRANSACTION_AMT & LARGE_CASH_TRANSACTION_CNT))'

generate_threshold_id(base_threshold_set_id=None)¶

get_bar_plot(group_by='RUN_DATE', figsize=(14, 8))¶

get_cross_segment_parameter_analysis(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)¶

Function to compare distribution of alerts and effective alerts for specified parameters across specified segments.

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
select_features (list) – list of features to be plotted
figsize (tuple) – size of plot
title (str) – title of chart

Returns

plot

get_cross_tab(segments_include=None, segments_exclude=None, feature_name=None, bins=None, round_digits=0)¶

Utility function to get a cross tab of parameter and effective flag

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
feature_name (str) – name of the feature being analyzed
bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals
round_digits (int) – number specifying how intervals should be rounded. To be used when deciles are used to bin the parameter values

Returns

dataframe

get_data(tag='BTL', tunable_parameters=None, segments_include=None, segments_exclude=None)¶

It retrieves the actual data for the tunable parameters passed by the user. Depending on the tag, either BTL or ATL data will be retrieved for analysis.

Parameters

tag (str) – ‘BTL’ tag for BTL analysis and ‘ATL’ tag for ATL analysis. Default is ‘BTL’
tunable_parameters (str) – (mandatory) Parmeteres to be tuned passed using logical & and | operators
segments_include (list) – segments to be included
segments_exclude (list) – segments to be excluded

Returns

dataframe

Examples:

>>> tunable_parameters='Large_Cash_Transaction_Amt & (Large_Cash_Transaction_Cnt | (Large_Cash_Transaction_Amt & Large_Cash_Transaction_Cnt))'
>>> ofs_asc.get_data(tunable_parameters=tunable_parameters, segments_include=['AMEA_HR','AMEA_MR'])

get_definition_versions()¶

Loads the object saved using self.save_object()

Param: None
Returns: list of available versions for current definition.

Example:

>>> data_pdf = asc.get_definition_versions()

get_density_plots(segments_include=None, segments_exclude=None, select_features=None, figsize=(14, 6), title=None)¶

Function to get density plots for multiple parameters

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
select_features (list) – List of feature names(strings) to be plotted
figsize (tuple) – size of plot
title (str) – title of chart

Returns

plot

get_effectiveness_trend(segments_include=None, segments_exclude=None, feature_name=None, bins=None, figsize=(14, 6), title=None, **kwargs)¶

Function to get percentage effective alerts trend plots for multiple parameters

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
feature_name (str) – feature to be analyzed
bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals
figsize (tuple) – size of plot
title (str) – title of chart
**kwargs –
Keyword arguments:
- round_digits (int) –
  number of digits to which bin limits are to be rounded
- ax (int) –
  axis on which the plot is to be placed. Used only when creating multi grid plots

Returns

Plot

get_effectiveness_trends(segments_include=None, segments_exclude=None, select_features=None, bins=None, figsize=(8, 6), title=None, **kwargs)¶

Function to get density plots for multiple parameters

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
select_features (list) – List of feature names(strings) to be plotted
bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals
figsize (tuple) – size of plot
title (str) – title of chart
**kwargs –
Keyword arguments:
- round_digits (int) –
  number of digits to which bin limits are to be rounded

Returns

Plot

get_frequency_table_1D(segments_include=None, segments_exclude=None, feature_name=None, bins=None, plot=False, figsize=(8, 6), **kwargs)¶

Function to return a frequency table and optionally convert to a heat map

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
feature_name (str) – feature to be analyzed
bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals
plot (bool) – specifying whether to plot a heat map
figsize (tuple) – size of plot
**kwargs –
Keyword arguments:
- cmap (str) –
  any accepted python color map
- round_digits (int) –
  number of digits to which bin limits are to be rounded
- ax (int) –
  axis on which to place the plot. Used only for multigrid plots

Returns

dataframe and Plot(optional)

get_frequency_table_2D(segments_include=None, segments_exclude=None, select_features=None, bins=None, plot=False, figsize=(14, 8), **kwargs)¶

Function to return a frequency table and optionally convert to a heat map

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
select_features (list) – List of feature names(strings) to be plotted
bins (numpy array or list) – array specifying bins, values should be bucketed into; If none deciles are used as intervals
plot (bool) – specifying whether to plot a heat map
figsize (tuple) – size of plot
**kwargs –
Keyword arguments:
- cmap (str) –
  any accepted python color map
- round_digits (int) –
  number of digits to which bin limits are to be rounded

Returns

dataframe and Plot(optional)

get_frequency_tables_1D(segments_include=None, segments_exclude=None, select_features=None, bins=None, figsize=(8, 6), plot=True, title=None, **kwargs)¶

Function to get frequency tables and optionally heatmaps for multiple parameters

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude
select_features (list) – features to be analyzed
bins (list of lists or arrays) – list of arrays specifying bins, values should be bucketed into; If none deciles are used as intervals
figsize (tuple) – size of plot
plot (bool) – specifying whether to plot a heat map
title (str) – title for plot
**kwargs –
Keyword arguments:
- cmap (str) –
  any accepted python color map
- round_digits (int) –
  number of digits to which bin limits are to be rounded

Returns

list of dataframes and Plot(optional)

get_investigation_status(oracle_ecm=True, test_ecm_conn=None)¶

Function to return the Investigation status for the Samples collected and Samples reviewed at ECM. This API connects to ECM to get the disposition status for the samples and update events dispositions in table ASC_INVESTIGATED_ENTITIES.

Parameters

test_ecm_conn (str) – ECM connection string for debug purposes
oracle_ecm (str) – Use oracle ECM to get the dispositions status for samples. Default is True

Returns

dataframe

get_overall_summary(segments_include=None, segments_exclude=None)¶

Function to get segment wise summary of ATL Data

Parameters

segments_include (list) – list of segments to include
segments_exclude (list) – list of segments to exclude

Returns

Dataframe with summary of alerts

get_samples(strata_include=None, strata_exclude=None)¶

It takes the random samples from each strata equal to the number of sample_size calculated.

::param strata_include: strata to be included :type strata_include: list :param strata_exclude: strata to be excluded :type strata_exclude: list

Returns: Returns successful message

Examples:

>>> ofs_asc.get_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)

get_threshold_set_ids(scenario='Sig Cash', focus='Customer', threshold_set_id=None)¶

Loads the object saved using self.save_object()

Param: None
Returns: list of available versions for current definition.

Example:

>>> data_pdf = asc.get_threshold_set_ids()

hyper_geometric(N, Pt=0.005, Pe=0, Pw=0.95)¶

sample size calculation from Hyper geometric distribution

Parameters

N – Strata size
Pt – Tolerable suspiciuos event rate
Pe – Expected interesting event rate
Pw – Power or reliability

Returns

sample_size

identify_atl_btl_population(oracle_ecm=True, test_ecm_conn=None)¶

This API calls a SQL procedure p_asc_clean_atl and p_asc_insert_atl_data in sequence. The first procedure removes atl alerts from ASC_EVENT_MASTER table for that scenario and run dates. The second procedure insert all possible ATL alerts from KDD tables for that scenario and run dates.

Parameters

test_ecm_conn (str) – ECM connection string for debug purposes
oracle_ecm (str) – Use oracle ECM to get the dispositions status for samples. Default is True

Returns

None

import_analysis_templates(definition_name=None)¶

This API will create the objectives in MMG by taking scenario folder name as an input and also imports below analysis drafts to respective objectives.

BTL Analysis

ATL Analysis

ASC Scenario Execution

Impact Analysis

Parameters: definition_name – folder name which will be created under Home/ASC/Analysis. Ex: ‘Sig Cash 2’
Returns: API response in json format.

Examples:

>>> aif.import_analysis_templates(definition_name='Sig Cash 2')

investigate_samples(strata_include=None, strata_exclude=None)¶

It pushes the selected samples to DB table ASC_EVENT_SAMPLE

Parameters

strata_include (list) – strata to be included
strata_exclude (list) – strata to be excluded

Returns

Returns successful message

Examples:

>>> ofs_asc.investigate_samples(strata_include = None, strata_exclude = ['AMEA_HR_0','AMEA_MR_0','AMEA_RR_0'])

load_asc_event_master()¶

It calls a SQL procedure p_asc_load_event_master which loads data into ASC_EVENT_MASTER table for a list of fic_mis_date given by a class variable self.run_dates

Param: None
Returns: status message for each fic_mis_date

load_object(version=None)¶

Loads the object saved using self.save_object()

Parameters: version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.
Returns: valid python object on successful execution.

Example:

>>> data_pdf = ofs_asc.load_object()

percentile_method(df_stratification=None, perc_list=None)¶

sample size calculation from Hyper geometric distribution

Parameters

df_stratification – input dataframe
perc_list – percentile list on which data will be split

Returns

stratified dataframe

perform_stratification(stratification_method=<function asc.percentile_method>, **kwargs)¶

It stratifies the input population (group by JURISDICTION and RISK_LEVEL) using stratified sampling and assigns strata number to each group within chosen segment. The default method is percentile used for creating stratas where each tunable parameter is converted to percentile feature and then based on the cutt-off percentile passed by perc_list, the segments gets splited into number of stratas equal to the splited values passed in perc_list.

Parameters

stratification_method – It takes function as an input. strata will be created on percentile method. Default method is percentile_method. It converts each tunable parameter into percentile and then pass to the expression. The expression evaluates the final percentile which is then used to split the population into different strata. - User defined method for stratification is also acceptable. Ex: stratification_method=stratification_on_amount. - In user-defined parameter, the first parameter should always be input dataframe. - User-defined method should always return STRATA as one more column with input dataframe.
Kwargs –
Keyword arguments: - perc_list :
- It is only applicable when startification_method=percentile_method
- List of cut-off percentiles based on which strata will be created.
- Strata number will start from left to right of the list.
- The order of strata will be from strata 1, strata 2, strata 3 and so on..
- Strata 1 will be closer to ATL line and last strata will be at the bottom of BTL data

Returns

successful message

Examples:

>>> #inbuit method
>>> asc.perform_stratification(stratification_method=asc.percentile_method, perc_list=[0.8,0.6,0.4,0.2])
>>>
>>> #User-defined method
>>> def stratification_on_amount(evented_data_pdf, split_on_amount=[200000, 100000, 50000]):
>>>     evented_data_pdf['STRATA'] = len(split_on_amount)+1
>>>     df_filter=evented_data_pdf.copy()
>>>     strata_no = 1
>>>     for i in split_on_amount:
>>>         df = df_filter[df_filter['STRATA'] == len(split_on_amount) + 1]
>>>         ls_event_ids = df[df['TRANS_BASE_AMT'] >= i]['EVENT_ID'].to_list()
>>>         df_filter.loc[df_filter['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no
>>>         evented_data_pdf.loc[evented_data_pdf['EVENT_ID'].isin(ls_event_ids), 'STRATA'] = strata_no
>>>         strata_no = strata_no + 1
>>> return evented_data_pdf
>>>
>>> #Calling the user defined method
>>> asc.perform_stratification(stratification_method=stratification_on_amount, split_on_amount=[200000, 100000, 50000])

save_object(description=None)¶

save python objects permanently. Load them whenever needed.

Parameters: description – description for the object to be saved
Returns: Paragrapgh execution will show success else failure.

Example:

>>> data_pdf = data.frame({'COL1':[1,2,3], 'COL2':[4,5,6]})
>>> ofs_asc.save_object( value = data_pdf)

scenario_post_processing()¶

It calls a method load_asc_event_master

Param: None
Returns: None

show_btl_thresholds(df=None, segment=None, strata=None)¶

show_event_volume(group_by='ALL', tag=None)¶

This API displays the alerts volume for different group_by input. The table ASC_EVENT_MASTER will be used as an input for aggregating the data.

Parameters

group_by – input value on which alerts volume will be aggregated. It takes following inputs: - RUN_DATE : Aggregate alerts on fic_mis_date - SEGMENT_ID : Aggregate alerts on Jurisdiction and Risk_Level - EVENT_TAG : Aggregate alerts by event tag i.e. ATL or BTL - ALL : Aggregate alerts by all inputs together i.e by SEGMENT_ID, RUN_DATE, EVENT_TAG
tag – Filter ‘ATL’ or ‘BTL’. If None, It will show both ATL and BTL alerts.

Returns

dataframe showing alerts volume by different inputs.

show_execution_status()¶

It shows the execution status of the scenarios. The status can be one of these:

RUNNING

FAILED

COMPLETED

Param: None
Returns: dataframe

show_investigation_summary()¶

This API returns the summary of percentage effective alerts for each segment and strata.

Param: None
Returns: dataframe

show_prod_alerts_volume(run_dates=None, group_by='RUN_DATE')¶

show_sample_size(strata_include=None, strata_exclude=None)¶

It shows the sample size calculated for each of the segments.

Parameters

strata_include (list) – strata to be included
strata_exclude (list) – strata to be excluded

Returns

dataframe showing sample size for each strata

Examples:

>>> ofs_asc.show_sample_size(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)

show_samples(strata_include=None, strata_exclude=None)¶

It shows the selected samples for each strata.

Parameters

strata_include (list) – strata to be included
strata_exclude (list) – strata to be excluded

Returns

Returns dataframe showing selected samples

Examples:

>>> ofs_asc.show_samples(strata_include = ['AMEA_HR_1','AMEA_MR_1'], strata_exclude = None)

show_scenario_and_focus()¶

Display available scenarios and focal entities for analysis

Param: None
Returns: dataframe

Examples:

>>> ofs_asc.show_scenario_and_focus()

show_scenario_bindings()¶

Returns the binding names for current selected scenario which user can choose for setting up the tunable parameters.

Param: None
Returns: list of scenario binding’s names

show_scenario_execution_parameters()¶

This API finds out all the possible run dates for date ranges provided by the user and display all parameters set by the user.

Param: None
Returns: Display parameters set by a user.

show_stratification_summary()¶

It displays strata number assigned to each group along with events population within each strata.

Param: None
Returns: dataframe showing strata number assigned to each segment

Examples:

>>> ofs_asc.show_stratification_summary()

update_dispositions_from_ecm(scenario_id=None, run_dates=None, test_ecm_conn=None)¶

This API connects with oracle ECM and update dispositions in ASC_INVESTIGATED_ENTITIES for a scenario and run dates. A connection with ECM must have created in CS before calling this API

Parameters

scenario_id – valid scenario id
test_ecm_conn (str) – list of comma separated run dates

Returns

successful message

class operator_overloading(operator_value)¶: Bases: object

ofs_aif.ofs_asc package¶

Submodules¶

ofs_aif.ofs_asc.asc module¶

Module contents¶