ofs_aif.scenario_models package¶
Subpackages¶
- ofs_aif.scenario_models.red_flags package
- Submodules
- ofs_aif.scenario_models.red_flags.CCPTRNSFRFREQ module
- ofs_aif.scenario_models.red_flags.CCPTRXNCT module
- ofs_aif.scenario_models.red_flags.CDBAL module
- ofs_aif.scenario_models.red_flags.CDBALTRXNSPIKE module
- ofs_aif.scenario_models.red_flags.CDHRG module
- ofs_aif.scenario_models.red_flags.CDR module
- ofs_aif.scenario_models.red_flags.CDRROUNDED module
- ofs_aif.scenario_models.red_flags.CDS module
- ofs_aif.scenario_models.red_flags.CUSTNWAVGINDTRXN module
- ofs_aif.scenario_models.red_flags.DAVGTRANSNW module
- ofs_aif.scenario_models.red_flags.KYCR module
- Module contents
Submodules¶
ofs_aif.scenario_models.event_processing module¶
- class event_processing(connect_with_default_workspace=True)¶
Bases:
ofs_aif.aif.aif- convert_to_sql_format(filters)¶
- create_event(ficmisdate=None, model_group_name=None, model_group_scenario_name=None, batch_run_id=None)¶
API is to create event in compliance studio schema based on AIF output score for an entity.
- Parameters
ficmisdate – Input date for which event to be created.
- Returns
Success/Failure status based on output of underlying SQL procedure.
ofs_aif.scenario_models.scenario_models module¶
- class scenario_models(connect_with_default_workspace=True)¶
Bases:
ofs_aif.supervised.supervisedThis class
scenario_modelsis a special use case of supervised learning for anti-money laundering for scenarios- Stage2_data_prep(frequency=None, lookback=None, date_range=None, model_group_name=None, model_group_scenario_name=None, return_labels=False)¶
- Parameters
frequency – frequency of run scenarios
lookback – lookback period for a scenario
date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.
:param model_group_name Model group for which the validation has to be performed
Data Frame containing modeling dataset Stage_2_pdf
- add_model_groups(meta_data_df)¶
A wrapper around AIF
add_model_groupsmethod
- add_to_modeling_dataset(pandas_df_list=None, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, reset=False, osot=False, is_validation=False)¶
Merge all input data frames and create modeling dataset.
- Parameters
pandas_df_list – list of transformed data frames.
key_var – key variable to be used for merging the results. all input data frames is expected to contain key column, else merge wil fail.
label_var – Name of the Target/Label variable if present in the input/resultant data frame .
overwrite – Boolean flag with Options as True/False. If True, keeps overwriting existing column from the new resultant frame. Else will leave as it is.
reset – Boolean flag with Options as True/False. If True, resets modeling data to None.
osot – Boolean flag with Options as True/False. True for osot data.
is_validation – Boolean flag with Options as True/False. True for validation data.
- Note:
overwrite flag overwrites only those columns of modeling data, which are part of current API call. Columns not part of the current API call will not be overwritten. reset flag deletes all the modeling data prior to current API call.
- Returns
True/False. True on succesful operation. On successful operation, modeling data is created and saved inside the class object.
- Example:
>>> aif.add_to_modeling_dataset( [ ts_pdf, jbm_pdf, graph_embeddings_pdf, average_deposit_pdf, NB_PDF ], overwrite = True )
- aggregate_base_features(from_date=None, to_date=None, frequency=None, last_run_date=None, look_back=None, focus=None, model_group_name=None, model_name=None, filters=None, prod_flag=None, include_full_lookback=None, fic_mis_date=None, batch_run_id=None)¶
This API is used to aggregate the base features to the table
ml4aml_sm_base_featuresfor the given parameters.- Parameters
from_date – Start date for Historic Data lookup in DD-Mon-YYYY format
to_date –
End Date for Historic Data lookup in DD-Mon-YYYY format. :param frequency: The frequency of the scenario execution.
Example : 1 ( Daily ), 7 ( Weekly ), 14 ( Bi-weekly ), 30/31 ( Monthly ), <Any number which user wants>
- param last_run_date
The last run date within the from_date and to_date range which exactly matches the scenario run date in DD-Mon-YYYY format.
- param lookback
The lookback period for the scenario. Example : 30
- param focus
The model entity name provided in the Admin notebook dataframe while creating the model group. Options CUSTOMER or ACCOUNT
- param model_group_name
Name of the Model Group for which Base Feature Aggregation is to be created.
- param model_name
Name of the Model used while importing the model template using Admin Notebook.
- param filters
Scenario specific parameters which are used to give additional control for the base feature aggregation. Format: Param1 : Value1 ~ Param2 : Value2a | Value2b | Value2c.
- param prod_flag
Flag to indicate Training/Scoring scenario. Options N or Y. For sandbox/historic training scenarios, prod_flag should be set to N
- param include_full_lookback
Flag to indicate whether the lookback should consider data beyond the from_date to aggregating base features. Options Y or N
- param fic_mis_date
AAI FIC MIS Date used in the batch execution.
batch_run_id –
AAI Batch Run ID for the execution
- return
dataframe
- Examples:
>>>sm.aggregate_base_features(from_date = ‘01-Jan-2015’,
>>> to_date = '31-Dec-2016', >>> frequency = 7, >>> last_run_date = '09-May-2016', >>> look_back = 30, >>> focus = 'CUSTOMER', >>> model_group_name = 'MODELGROUP1', >>> model_name = 'RMF_LRT', >>> filters = 'PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_ROUND_AMT:10~MAX_TRANS_ROUND_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10~DEGREE_OF_PARALLELISM:8', >>> prod_flag = 'N', >>> include_full_lookback = 'N', >>> fic_mis_date = '2023-12-21', >>> batch_run_id = 'SM_Aggregate_Base_Features_SB_2024-01-30_111111111111_1' >>> )
- annual_model_validation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None, performance_metrics_list='Kappa Curve~F1 Curve~PR Curve~ROC Curve~Prediction Density~Confusion Matrix:Kappa~AUC Change~PSI', model_id_list='Deployed', from_date=None, to_date=None, printrs=None)¶
- Parameters
model_group_name – Model group for which the validation has to be performed
model_group_scenario_name – Model group Scenario for which the validation has to be performed
fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD
performance_metrics_list –
List of performance metrics on which the Model has to be evaluated. Available Metrics:
Kappa Curve
F1 Curve
PR Curve
ROC Curve
Prediction Density
Confusion Matrix:Kappa
AUC Change
PSI
model_id_list – List of Model ids for which Model has to be evaluated.
printrs – (True/False) to print the output. If set to True, will print AUC Change and PSI.
- Returns
Plot or metric depending the types of performance metric chosen
- convert_to_sql_format(filters)¶
A utility that converts user-passed filter string to SQL formatted string
- create_base_features(lookback=None, frequency=None, scenario_id_ls=None, is_investigated=None, date_range=None, date_range_osot=None, batch_run_id='Scoring Batch')¶
It fetches the base features from the table
ml4aml_sm_base_featuresfor the given model group, frequency, lookback, focus and date-ranges.- Parameters
lookback – (optional) lookback period. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMSfrequency – (optional) run frequency. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMSscenario_id_ls – list of scenario’s ids. Ex: [116000031]
is_investigated –
- (optional) Input describes the type of alerts returned.
’INVESTIGATED’ : reviewed alerts
’UNINVESTIGATED’ : un reviewed alerts
’ALL’ : Both reviewed and unreviewed alerts
Default is INVESTIGATED For training, always reviewed customers will be fetched.
date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.
date_range_osot – From and To Date for OSOT Validation data set in YYYYMM format as numeric data type
batch_run_id – Batch run id needed during scoring.
- Returns
dataframe
- Examples:
>>> sm.create_base_features(lookback=30, >>> frequency=7, >>> scenario_id_ls=[116000031,116000079], >>> date_range=[201605,201706], >>> date_range_osot=[201706,201706]) >>>
- create_definition(model_group_name=None, model_group_scenario_name=None, save_with_new_version=False, cleanup_results=False, version=None)¶
API creates unique definition using Model Group for a given Notebook. Internally called AIF
create_definitionmethod.- Parameters
save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.
cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.
- Returns
Return successful message on completion, and proper error message on failure.
- Examples:
>>> sm.create_definition( save_with_new_version = False, >>> cleanup_results = False, >>> version = None ) Definition creation successful... True
- create_modeling_dataset(X=None, key_var='ENTITY_ID', label_var='SAR_FLG', osot=False, is_validation=False)¶
This API converts any new AMLES data into modelling data by applying all the transformations recorded during training process for unsupervised.
- Parameters
X – input pandas data frame.
key_var – Identity column. Default “ENTITY_ID”
label_var – target variable. Default ‘SAR_FLG’
osot – Boolean True/False. If running for out-time data (OSOT), it should be True.
is_validation – Boolean True/False. If running for validation use case, it should be True
- Returns
dataframe
- Example:
>>> sm.create_modeling_dataset(osot=True) >>> sm.create_modeling_dataset(is_validation=True)
- data_quality_data_preparation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None)¶
- Parameters
model_group_name – model group name which was created in ADMIN notebook
model_group_scenario_name – model name which describes the scenario. Ex: RMF, RMF_LRT
fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD
Object of the class DataQualityCheck from module ofs_auto_ml.utils.data_quality_checks.
- get_base_feature_params(model_group_name=None, model_group_scenario_name=None)¶
- Parameters
model_group_name – Model group for which the validation has to be performed
model_group_scenario_name – Model group Scenario for which the validation has to be performed
frequency, lookback, cutoff
- get_base_features(osot=False, is_validation=False)¶
Get modeling data in current session
- Parameters
osot – True for OSOT(out-time) dataset. Class variable
sm.B_DF_OSOTwill be setis_validation – True for validation dataset. Class variable
sm.B_DF_VALwill be set
- Returns
modeling data as pandas data frame.
- Example:
>>> aif.get_base_features(osot=True) >>> aif.get_base_features(is_validation=True)
- get_calendar_date_range(fic_mis_date=None, lookback=None)¶
It generates a date range for a given fic_mis_date and lookback.
- Parameters
fic_mis_date – fic_mis_date
lookback – lookback period.
- Returns
list of date ranges
- Examples:
>>> sm.get_calendar_date_range(fic_mis_date = '2024-02-01', lookback=14)
- get_modeling_dataset(osot=False, is_validation=False)¶
Get modeling data in current session
- Parameters
osot – True for OSOT(out-time) dataset. Class variable
sm.B_DF_OSOTwill be setis_validation – True for validation dataset. Class variable
sm.B_DF_VALwill be set
- Returns
pandas data frame.
- Example:
>>> aif.get_modeling_dataset(osot=True) >>> aif.get_modeling_dataset(is_validation=True)
- get_non_behavioral_data(model_group_name=None, frequency=None, lookback=None, key_var='V_ENTITY_CD')¶
It retrieves the non-behavioral features for customer.
- Parameters
model_group_name – (optional) model group name. Ex: SG_RETAIL_MM
lookback – (optional) lookback period. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMSfrequency – (optional) run frequency. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMSkey_var – (optional) Unique indentifier in customer table.
- Returns
DataFrame
- Examples:
>>> sm.get_non_behavioral_data()
- get_scenario_features(X=None, key_var='ENTITY_ID', label_var='SAR_FLG')¶
It calculates the red flag features and appended with the base features dataframe.
- Parameters
X – (optional) input dataframe.
key_var – (optional) Unique key Identifier in dataframe
label_var – (optional) Target Variable. Default is ‘SAR_FLG’
- Returns
dataframe
- Example:
>>> sm.get_scenario_features()
- import_model_template(meta_data_df=None, model_name=None, overwrite=False)¶
This API will create the objectives in MMG by taking model group metadata as an input and also imports model drafts to respective objectives.
- Parameters
meta_data_df – Same as the one created for adding model groups.
model_name – String which describes the scenarios. Ex: RMF, RMF_LRT
overwrite – If True Model Templates will be overwritten.
- Returns
API response in json format.
- Examples:
>>> sm.import_model_templates(meta_data_df=pdf)
- merge(pandas_df_list, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, osot=False, is_validation=False)¶
Customized merge used for concatenation multiple dataframes for Scenario Modeling use case.
- Parameters
pandas_df_list – list of dataframes to merge
key_var – Identity column. Default “ENTITY_ID”
label_var – target variable. Default ‘SAR_FLG’
overwrite – Boolean True/False. Overwrite existing column’s data if True.
osot – Boolean True/False. If running for out-time data (OSOT), it should be True.
is_validation – Boolean True/False. If running for validation use case, it should be True
- Returns
resulted dataframe
- monthly_model_validation(fic_mis_date=None, model_group_name=None, model_name=None, focus=None, model_id='Deployed', monitoring_technique=['CDBD', 'MD'], n_bins=9, N=5, No_SD=2)¶
- Parameters
fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD
model_group_name – model group name which was created in ADMIN notebook
Ex (model_group_scenario_name model name which describes the scenario.) – RMF, RMF_LRT
model_id – List of Model ids for which Model has to be evaluated.
monitoring_technique – List of monitoring techniques. Confidence Distribution Batch Detection Method or Margin density
n_bins – No of bins to be used
N – No of bootstrap samples on which to estimate thresholds
No_SD – Threshold setting to be used
Output if drift observer or not
- predict(X=None, key_column='ENTITY_ID', model_group_name=None, model_group_scenario_name=None, frequency=None, lookback=None, fic_mis_date=None, batch_run_id=None, threshold=None, return_score=False, debug=False, write_db_output=True, get_percentiles=True, get_deciles=True, btl_sample_count=0, n_top_contrib=None, unknown_data_score=- 999)¶
Test scoring interactively by connecting to production like schema before scheduling it as batch process in real production. Same sandbox can also be used for the scoring purpose. In this case sandbox schema should have scoring related input and output tables. All run time parameters expected during scoring batch should be set in studio paragraph for testing purpose.
- Parameters
X – Stage 2 transformed new data as pandas data frame. default is None
key_column – Identity column
model_group_name – Name of the deployed model group.
model_group_scenario_name – Name of the deployed model group scenario. Always None for AMLES
fic_mis_date – AAI FIC MIS Date used in the batch execution.
batch_run_id – AAI Batch Run ID for the execution
threshold – Threshold to generate events for ECM. default 0.7
return_score – Boolean flag. If set to True scoring result is returned as pandas data frame to the caller. Default is False, and which is real production use case.
debug – Boolean(True/False). If set to True, debug mode is on
write_db_output – Boolean(True/False). If set to True, write to output table SM_EVENT_SCORE_DETAILS and SM_EVENT_SCORE
get_percentiles – Boolean(True/False). If set to True, percentile score will be calculated and returned
get_deciles – Boolean(True/False). If set to True, decile buckets will be calculated and returned
btl_sample_count – Number of BTL (below the threshold) random samples would be added with ATL. - if integer, take as many number of samples from BTL population and append with ATL. Ex: 10 - if fraction, take sample count from fraction of ATL population and collect as many samples from BTL population. Ex: 0.2
unknown_data_score – For missing categories, default score to be set. Default is -999
n_top_contrib – Int. Number of top contributing features. Ex: 5
- Returns
Returns output scores as pandas data frame.
- Examples:
>>> score_pdf_list = self.predict(X = Stage_2_OSOT_pdf, >>> key_column = 'ENTITY_ID' >>> model_group_name = 'SG_RETAIL_MM', >>> model_group_scenario_name = 'RMF_CUSTOMER', >>> fic_mis_date = date.today(), >>> batch_run_id = 'RRF_ICC_BATCH_123', >>> threshold = 0.5, >>> return_score = True, >>> debug = True, >>> write_db_output=True, >>> btl_sample_count = 10) Returns output scores as pandas data frame
- sar_extraction(mode='FILE', if_exists='OVERWRITE', ecm_datastore_name=None, processing_batch=None, from_date=None, to_date=None)¶
- save_base_feat_params(model_group_name=None, model_name=None, focus=None, frequency=None, lookback=None, filter=None, threshold_cut_off=0.7)¶
insert batch parameters details into table
ML4AML_SM_BASE_FEAT_PARAMSfor reused during training and production.- Parameters
model_group_name – model group name which was created in ADMIN notebook
model_name – model name which describes the scenario. Ex: RMF, RMF_LRT
focus – entity type or segment name. Ex: CUSTOMER, ACCOUNT
frequency – frequency of run scenarios
lookback – lookback period for a scenario
filter – filter parameters which are to be applied on tables
threshold_cut_off – scoring threshold cut off value. Ex: 0.9
- Returns
None
- Examples:
>>> sm.insert_batch_params(model_group_name = 'GROUP1', model_name='RMF', focus='CUSTOMER', >>> frequency=7, lookback=30, >>> filter='PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_AMT:10~MAX_TRANS_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10')
- set_threshold_cut_off(threshold_cut_off=None)¶
set threshold cut off for scoring in table
ML4AML_SM_BASE_FEAT_PARAMS- Parameters
threshold_cut_off – threshold cut off value
- Returns
None
- Examples:
>>> sm.set_threshold_cut_off(threshold_cut_off=0.8)
- show_data_volume(model_group_name=None, model_group_scenario_name=None, focus=None, lookback=None, frequency=None)¶
It shows the month wise data volume count for the given scenarios. The aggregation is done on scenario run date.
- Parameters
model_group_name – (optional) model group name. Ex: SG_RETAIL_MM
model_group_scenario_name – (optional) model group scenario name. Combination of model_name+’_’+focus. Ex: RMF_CUSTOMER
focus – (optional) entity type. Ex: CUSTOMER or ACCOUNT
lookback – (optional) lookback period. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMSfrequency – (optional) run frequency. If None, it will be taken from table
ML4AML_SM_BASE_FEAT_PARAMS
- Returns
DataFrame showing the data count for each month
- Examples:
>>> sm.show_data_volume()
- show_red_flag_features(scenario_id_ls=None)¶
It shows all available out of box red flag features configured for a scenario.
- Parameters
scenario_id_ls – list of scenarios.
- Returns
dataframe showing available red flags
- Example:
>>> sm.show_red_flag_features(scenario_id_ls='116000079')
- show_scenarios()¶
This API shows available scenarios along with unique scenario Id
- Param
None
- Returns
return dataframe
- Example:
>>> sm.show_scenarios()