ofs_aif.scenario_models package¶

Subpackages¶

ofs_aif.scenario_models.red_flags package

Submodules¶

ofs_aif.scenario_models.event_processing module¶

class event_processing(connect_with_default_workspace=True)¶

Bases: ofs_aif.aif.aif

convert_to_sql_format(filters)¶

create_event(ficmisdate=None, model_group_name=None, model_group_scenario_name=None, batch_run_id=None)¶

API is to create event in compliance studio schema based on AIF output score for an entity.

Parameters: ficmisdate – Input date for which event to be created.
Returns: Success/Failure status based on output of underlying SQL procedure.

ofs_aif.scenario_models.scenario_models module¶

class scenario_models(connect_with_default_workspace=True)¶

Bases: ofs_aif.supervised.supervised

This class scenario_models is a special use case of supervised learning for anti-money laundering for scenarios

Stage2_data_prep(frequency=None, lookback=None, date_range=None, model_group_name=None, model_group_scenario_name=None, return_labels=False)¶

Parameters

frequency – frequency of run scenarios
lookback – lookback period for a scenario
date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.

:param model_group_name Model group for which the validation has to be performed

Data Frame containing modeling dataset Stage_2_pdf

add_model_groups(meta_data_df)¶: A wrapper around AIF add_model_groups method

add_to_modeling_dataset(pandas_df_list=None, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, reset=False, osot=False, is_validation=False)¶

Merge all input data frames and create modeling dataset.

Parameters

pandas_df_list – list of transformed data frames.
key_var – key variable to be used for merging the results. all input data frames is expected to contain key column, else merge wil fail.
label_var – Name of the Target/Label variable if present in the input/resultant data frame .
overwrite – Boolean flag with Options as True/False. If True, keeps overwriting existing column from the new resultant frame. Else will leave as it is.
reset – Boolean flag with Options as True/False. If True, resets modeling data to None.
osot – Boolean flag with Options as True/False. True for osot data.
is_validation – Boolean flag with Options as True/False. True for validation data.

Note:: overwrite flag overwrites only those columns of modeling data, which are part of current API call. Columns not part of the current API call will not be overwritten. reset flag deletes all the modeling data prior to current API call.

Returns: True/False. True on succesful operation. On successful operation, modeling data is created and saved inside the class object.

Example:

>>> aif.add_to_modeling_dataset( [ ts_pdf, jbm_pdf, graph_embeddings_pdf, average_deposit_pdf, NB_PDF ], overwrite = True )

aggregate_base_features(from_date=None, to_date=None, frequency=None, last_run_date=None, look_back=None, focus=None, model_group_name=None, model_name=None, filters=None, prod_flag=None, include_full_lookback=None, fic_mis_date=None, batch_run_id=None)¶

This API is used to aggregate the base features to the table ml4aml_sm_base_features for the given parameters.

Parameters

from_date – Start date for Historic Data lookup in DD-Mon-YYYY format
to_date –
End Date for Historic Data lookup in DD-Mon-YYYY format. :param frequency: The frequency of the scenario execution.

Example : 1 ( Daily ), 7 ( Weekly ), 14 ( Bi-weekly ), 30/31 ( Monthly ), <Any number which user wants>

param last_run_date

The last run date within the from_date and to_date range which exactly matches the scenario run date in DD-Mon-YYYY format.

param lookback

The lookback period for the scenario. Example : 30

param focus

The model entity name provided in the Admin notebook dataframe while creating the model group. Options CUSTOMER or ACCOUNT

param model_group_name

Name of the Model Group for which Base Feature Aggregation is to be created.

param model_name

Name of the Model used while importing the model template using Admin Notebook.

param filters

Scenario specific parameters which are used to give additional control for the base feature aggregation. Format: Param1 : Value1 ~ Param2 : Value2a | Value2b | Value2c.

param prod_flag

Flag to indicate Training/Scoring scenario. Options N or Y. For sandbox/historic training scenarios, prod_flag should be set to N

param include_full_lookback

Flag to indicate whether the lookback should consider data beyond the from_date to aggregating base features. Options Y or N

param fic_mis_date

AAI FIC MIS Date used in the batch execution.

batch_run_id –

AAI Batch Run ID for the execution

return: dataframe

Examples:: >>>sm.aggregate_base_features(from_date = ‘01-Jan-2015’,

>>>                        to_date = '31-Dec-2016',
>>>                        frequency = 7,
>>>                        last_run_date = '09-May-2016',
>>>                        look_back = 30,
>>>                        focus = 'CUSTOMER',
>>>                        model_group_name = 'MODELGROUP1',
>>>                        model_name = 'RMF_LRT',
>>>                        filters = 'PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_ROUND_AMT:10~MAX_TRANS_ROUND_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10~DEGREE_OF_PARALLELISM:8',
>>>                        prod_flag = 'N',
>>>                        include_full_lookback = 'N',
>>>                        fic_mis_date = '2023-12-21',
>>>                        batch_run_id = 'SM_Aggregate_Base_Features_SB_2024-01-30_111111111111_1'
>>>                        )

annual_model_validation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None, performance_metrics_list='Kappa Curve~F1 Curve~PR Curve~ROC Curve~Prediction Density~Confusion Matrix:Kappa~AUC Change~PSI', model_id_list='Deployed', from_date=None, to_date=None, printrs=None)¶

Parameters

model_group_name – Model group for which the validation has to be performed
model_group_scenario_name – Model group Scenario for which the validation has to be performed
fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD
performance_metrics_list –
List of performance metrics on which the Model has to be evaluated. Available Metrics:
- Kappa Curve
- F1 Curve
- PR Curve
- ROC Curve
- Prediction Density
- Confusion Matrix:Kappa
- AUC Change
- PSI
model_id_list – List of Model ids for which Model has to be evaluated.
printrs – (True/False) to print the output. If set to True, will print AUC Change and PSI.

Returns

Plot or metric depending the types of performance metric chosen

convert_to_sql_format(filters)¶: A utility that converts user-passed filter string to SQL formatted string

create_base_features(lookback=None, frequency=None, scenario_id_ls=None, is_investigated=None, date_range=None, date_range_osot=None, batch_run_id='Scoring Batch')¶

It fetches the base features from the table ml4aml_sm_base_features for the given model group, frequency, lookback, focus and date-ranges.

Parameters

lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS
frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS
scenario_id_ls – list of scenario’s ids. Ex: [116000031]
is_investigated –
(optional) Input describes the type of alerts returned.
- ’INVESTIGATED’ : reviewed alerts
- ’UNINVESTIGATED’ : un reviewed alerts
- ’ALL’ : Both reviewed and unreviewed alerts
Default is INVESTIGATED For training, always reviewed customers will be fetched.
date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.
date_range_osot – From and To Date for OSOT Validation data set in YYYYMM format as numeric data type
batch_run_id – Batch run id needed during scoring.

Returns

dataframe

Examples:

>>> sm.create_base_features(lookback=30,
>>>                 frequency=7,
>>>                 scenario_id_ls=[116000031,116000079],
>>>                 date_range=[201605,201706],
>>>                 date_range_osot=[201706,201706])
>>>

create_definition(model_group_name=None, model_group_scenario_name=None, save_with_new_version=False, cleanup_results=False, version=None)¶

API creates unique definition using Model Group for a given Notebook. Internally called AIF create_definition method.

Parameters

save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.
cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.

Returns

Return successful message on completion, and proper error message on failure.

Examples:

>>> sm.create_definition( save_with_new_version = False,
>>>                        cleanup_results = False,
>>>                        version = None )
Definition creation successful...
True

create_modeling_dataset(X=None, key_var='ENTITY_ID', label_var='SAR_FLG', osot=False, is_validation=False)¶

This API converts any new AMLES data into modelling data by applying all the transformations recorded during training process for unsupervised.

Parameters

X – input pandas data frame.
key_var – Identity column. Default “ENTITY_ID”
label_var – target variable. Default ‘SAR_FLG’
osot – Boolean True/False. If running for out-time data (OSOT), it should be True.
is_validation – Boolean True/False. If running for validation use case, it should be True

Returns

dataframe

Example:

>>> sm.create_modeling_dataset(osot=True)
>>> sm.create_modeling_dataset(is_validation=True)

data_quality_data_preparation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None)¶

Parameters

model_group_name – model group name which was created in ADMIN notebook
model_group_scenario_name – model name which describes the scenario. Ex: RMF, RMF_LRT
fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD

Object of the class DataQualityCheck from module ofs_auto_ml.utils.data_quality_checks.

get_base_feature_params(model_group_name=None, model_group_scenario_name=None)¶

Parameters

model_group_name – Model group for which the validation has to be performed
model_group_scenario_name – Model group Scenario for which the validation has to be performed

frequency, lookback, cutoff

get_base_features(osot=False, is_validation=False)¶

Get modeling data in current session

Parameters

osot – True for OSOT(out-time) dataset. Class variable sm.B_DF_OSOT will be set
is_validation – True for validation dataset. Class variable sm.B_DF_VAL will be set

Returns

modeling data as pandas data frame.

Example:

>>> aif.get_base_features(osot=True)
>>> aif.get_base_features(is_validation=True)

get_calendar_date_range(fic_mis_date=None, lookback=None)¶

It generates a date range for a given fic_mis_date and lookback.

Parameters

fic_mis_date – fic_mis_date
lookback – lookback period.

Returns

list of date ranges

Examples:

>>> sm.get_calendar_date_range(fic_mis_date = '2024-02-01', lookback=14)

get_modeling_dataset(osot=False, is_validation=False)¶

Get modeling data in current session

Parameters

osot – True for OSOT(out-time) dataset. Class variable sm.B_DF_OSOT will be set
is_validation – True for validation dataset. Class variable sm.B_DF_VAL will be set

Returns

pandas data frame.

Example:

>>> aif.get_modeling_dataset(osot=True)
>>> aif.get_modeling_dataset(is_validation=True)

get_non_behavioral_data(model_group_name=None, frequency=None, lookback=None, key_var='V_ENTITY_CD')¶

It retrieves the non-behavioral features for customer.

Parameters

model_group_name – (optional) model group name. Ex: SG_RETAIL_MM
lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS
frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS
key_var – (optional) Unique indentifier in customer table.

Returns

DataFrame

Examples:

>>> sm.get_non_behavioral_data()

get_scenario_features(X=None, key_var='ENTITY_ID', label_var='SAR_FLG')¶

It calculates the red flag features and appended with the base features dataframe.

Parameters

X – (optional) input dataframe.
key_var – (optional) Unique key Identifier in dataframe
label_var – (optional) Target Variable. Default is ‘SAR_FLG’

Returns

dataframe

Example:

>>> sm.get_scenario_features()

import_model_template(meta_data_df=None, model_name=None, overwrite=False)¶

This API will create the objectives in MMG by taking model group metadata as an input and also imports model drafts to respective objectives.

Parameters

meta_data_df – Same as the one created for adding model groups.
model_name – String which describes the scenarios. Ex: RMF, RMF_LRT
overwrite – If True Model Templates will be overwritten.

Returns

API response in json format.

Examples:

>>> sm.import_model_templates(meta_data_df=pdf)

merge(pandas_df_list, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, osot=False, is_validation=False)¶

Customized merge used for concatenation multiple dataframes for Scenario Modeling use case.

Parameters

pandas_df_list – list of dataframes to merge
key_var – Identity column. Default “ENTITY_ID”
label_var – target variable. Default ‘SAR_FLG’
overwrite – Boolean True/False. Overwrite existing column’s data if True.
osot – Boolean True/False. If running for out-time data (OSOT), it should be True.
is_validation – Boolean True/False. If running for validation use case, it should be True

Returns

resulted dataframe

monthly_model_validation(fic_mis_date=None, model_group_name=None, model_name=None, focus=None, model_id='Deployed', monitoring_technique=['CDBD', 'MD'], n_bins=9, N=5, No_SD=2)¶

Parameters

fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD
model_group_name – model group name which was created in ADMIN notebook
Ex (model_group_scenario_name model name which describes the scenario.) – RMF, RMF_LRT
model_id – List of Model ids for which Model has to be evaluated.
monitoring_technique – List of monitoring techniques. Confidence Distribution Batch Detection Method or Margin density
n_bins – No of bins to be used
N – No of bootstrap samples on which to estimate thresholds
No_SD – Threshold setting to be used

Output if drift observer or not

predict(X=None, key_column='ENTITY_ID', model_group_name=None, model_group_scenario_name=None, frequency=None, lookback=None, fic_mis_date=None, batch_run_id=None, threshold=None, return_score=False, debug=False, write_db_output=True, get_percentiles=True, get_deciles=True, btl_sample_count=0, n_top_contrib=None, unknown_data_score=- 999)¶

Test scoring interactively by connecting to production like schema before scheduling it as batch process in real production. Same sandbox can also be used for the scoring purpose. In this case sandbox schema should have scoring related input and output tables. All run time parameters expected during scoring batch should be set in studio paragraph for testing purpose.

Parameters

X – Stage 2 transformed new data as pandas data frame. default is None
key_column – Identity column
model_group_name – Name of the deployed model group.
model_group_scenario_name – Name of the deployed model group scenario. Always None for AMLES
fic_mis_date – AAI FIC MIS Date used in the batch execution.
batch_run_id – AAI Batch Run ID for the execution
threshold – Threshold to generate events for ECM. default 0.7
return_score – Boolean flag. If set to True scoring result is returned as pandas data frame to the caller. Default is False, and which is real production use case.
debug – Boolean(True/False). If set to True, debug mode is on
write_db_output – Boolean(True/False). If set to True, write to output table SM_EVENT_SCORE_DETAILS and SM_EVENT_SCORE
get_percentiles – Boolean(True/False). If set to True, percentile score will be calculated and returned
get_deciles – Boolean(True/False). If set to True, decile buckets will be calculated and returned
btl_sample_count – Number of BTL (below the threshold) random samples would be added with ATL. - if integer, take as many number of samples from BTL population and append with ATL. Ex: 10 - if fraction, take sample count from fraction of ATL population and collect as many samples from BTL population. Ex: 0.2
unknown_data_score – For missing categories, default score to be set. Default is -999
n_top_contrib – Int. Number of top contributing features. Ex: 5

Returns

Returns output scores as pandas data frame.

Examples:

>>> score_pdf_list = self.predict(X = Stage_2_OSOT_pdf,
>>>                 key_column = 'ENTITY_ID'
>>>                 model_group_name = 'SG_RETAIL_MM',
>>>                 model_group_scenario_name = 'RMF_CUSTOMER',
>>>                 fic_mis_date = date.today(),
>>>                 batch_run_id = 'RRF_ICC_BATCH_123',
>>>                 threshold = 0.5,
>>>                 return_score = True,
>>>                 debug = True,
>>>                 write_db_output=True,
>>>                 btl_sample_count = 10)
Returns output scores as pandas data frame

sar_extraction(mode='FILE', if_exists='OVERWRITE', ecm_datastore_name=None, processing_batch=None, from_date=None, to_date=None)¶

save_base_feat_params(model_group_name=None, model_name=None, focus=None, frequency=None, lookback=None, filter=None, threshold_cut_off=0.7)¶

insert batch parameters details into table ML4AML_SM_BASE_FEAT_PARAMS for reused during training and production.

Parameters

model_group_name – model group name which was created in ADMIN notebook
model_name – model name which describes the scenario. Ex: RMF, RMF_LRT
focus – entity type or segment name. Ex: CUSTOMER, ACCOUNT
frequency – frequency of run scenarios
lookback – lookback period for a scenario
filter – filter parameters which are to be applied on tables
threshold_cut_off – scoring threshold cut off value. Ex: 0.9

Returns

None

Examples:

>>> sm.insert_batch_params(model_group_name = 'GROUP1', model_name='RMF', focus='CUSTOMER',
>>> frequency=7, lookback=30,
>>> filter='PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_AMT:10~MAX_TRANS_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10')

set_threshold_cut_off(threshold_cut_off=None)¶

set threshold cut off for scoring in table ML4AML_SM_BASE_FEAT_PARAMS

Parameters: threshold_cut_off – threshold cut off value
Returns: None

Examples:

>>> sm.set_threshold_cut_off(threshold_cut_off=0.8)

show_data_volume(model_group_name=None, model_group_scenario_name=None, focus=None, lookback=None, frequency=None)¶

It shows the month wise data volume count for the given scenarios. The aggregation is done on scenario run date.

Parameters

model_group_name – (optional) model group name. Ex: SG_RETAIL_MM
model_group_scenario_name – (optional) model group scenario name. Combination of model_name+’_’+focus. Ex: RMF_CUSTOMER
focus – (optional) entity type. Ex: CUSTOMER or ACCOUNT
lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS
frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

Returns

DataFrame showing the data count for each month

Examples:

>>> sm.show_data_volume()

show_red_flag_features(scenario_id_ls=None)¶

It shows all available out of box red flag features configured for a scenario.

Parameters: scenario_id_ls – list of scenarios.
Returns: dataframe showing available red flags

Example:

>>> sm.show_red_flag_features(scenario_id_ls='116000079')

show_scenarios()¶

This API shows available scenarios along with unique scenario Id

Param: None
Returns: return dataframe

Example:

>>> sm.show_scenarios()

ofs_aif.scenario_models package¶

Subpackages¶

Submodules¶

ofs_aif.scenario_models.event_processing module¶

ofs_aif.scenario_models.scenario_models module¶

Module contents¶