ofs_aif.scenario_models package

Subpackages

Submodules

ofs_aif.scenario_models.event_processing module

class event_processing(connect_with_default_workspace=True)

Bases: ofs_aif.aif.aif

convert_to_sql_format(filters)
create_event(ficmisdate=None, model_group_name=None, model_group_scenario_name=None, batch_run_id=None)

API is to create event in compliance studio schema based on AIF output score for an entity.

Parameters

ficmisdate – Input date for which event to be created.

Returns

Success/Failure status based on output of underlying SQL procedure.

ofs_aif.scenario_models.scenario_models module

class scenario_models(connect_with_default_workspace=True)

Bases: ofs_aif.supervised.supervised

This class scenario_models is a special use case of supervised learning for anti-money laundering for scenarios

Stage2_data_prep(frequency=None, lookback=None, date_range=None, model_group_name=None, model_group_scenario_name=None, return_labels=False)
Parameters
  • frequency – frequency of run scenarios

  • lookback – lookback period for a scenario

  • date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.

:param model_group_name Model group for which the validation has to be performed

Data Frame containing modeling dataset Stage_2_pdf

add_model_groups(meta_data_df)

A wrapper around AIF add_model_groups method

add_to_modeling_dataset(pandas_df_list=None, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, reset=False, osot=False, is_validation=False)

Merge all input data frames and create modeling dataset.

Parameters
  • pandas_df_list – list of transformed data frames.

  • key_var – key variable to be used for merging the results. all input data frames is expected to contain key column, else merge wil fail.

  • label_var – Name of the Target/Label variable if present in the input/resultant data frame .

  • overwrite – Boolean flag with Options as True/False. If True, keeps overwriting existing column from the new resultant frame. Else will leave as it is.

  • reset – Boolean flag with Options as True/False. If True, resets modeling data to None.

  • osot – Boolean flag with Options as True/False. True for osot data.

  • is_validation – Boolean flag with Options as True/False. True for validation data.

Note:

overwrite flag overwrites only those columns of modeling data, which are part of current API call. Columns not part of the current API call will not be overwritten. reset flag deletes all the modeling data prior to current API call.

Returns

True/False. True on succesful operation. On successful operation, modeling data is created and saved inside the class object.

Example:
>>> aif.add_to_modeling_dataset( [ ts_pdf, jbm_pdf, graph_embeddings_pdf, average_deposit_pdf, NB_PDF ], overwrite = True )
aggregate_base_features(from_date=None, to_date=None, frequency=None, last_run_date=None, look_back=None, focus=None, model_group_name=None, model_name=None, filters=None, prod_flag=None, include_full_lookback=None, fic_mis_date=None, batch_run_id=None)

This API is used to aggregate the base features to the table ml4aml_sm_base_features for the given parameters.

Parameters
  • from_date – Start date for Historic Data lookup in DD-Mon-YYYY format

  • to_date

    End Date for Historic Data lookup in DD-Mon-YYYY format. :param frequency: The frequency of the scenario execution.

    Example : 1 ( Daily ), 7 ( Weekly ), 14 ( Bi-weekly ), 30/31 ( Monthly ), <Any number which user wants>

    param last_run_date

    The last run date within the from_date and to_date range which exactly matches the scenario run date in DD-Mon-YYYY format.

    param lookback

    The lookback period for the scenario. Example : 30

    param focus

    The model entity name provided in the Admin notebook dataframe while creating the model group. Options CUSTOMER or ACCOUNT

    param model_group_name

    Name of the Model Group for which Base Feature Aggregation is to be created.

    param model_name

    Name of the Model used while importing the model template using Admin Notebook.

    param filters

    Scenario specific parameters which are used to give additional control for the base feature aggregation. Format: Param1 : Value1 ~ Param2 : Value2a | Value2b | Value2c.

    param prod_flag

    Flag to indicate Training/Scoring scenario. Options N or Y. For sandbox/historic training scenarios, prod_flag should be set to N

    param include_full_lookback

    Flag to indicate whether the lookback should consider data beyond the from_date to aggregating base features. Options Y or N

    param fic_mis_date

    AAI FIC MIS Date used in the batch execution.

  • batch_run_id

    AAI Batch Run ID for the execution

    return

    dataframe

    Examples:

    >>>sm.aggregate_base_features(from_date = ‘01-Jan-2015’,

    >>>                        to_date = '31-Dec-2016',
    >>>                        frequency = 7,
    >>>                        last_run_date = '09-May-2016',
    >>>                        look_back = 30,
    >>>                        focus = 'CUSTOMER',
    >>>                        model_group_name = 'MODELGROUP1',
    >>>                        model_name = 'RMF_LRT',
    >>>                        filters = 'PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_ROUND_AMT:10~MAX_TRANS_ROUND_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10~DEGREE_OF_PARALLELISM:8',
    >>>                        prod_flag = 'N',
    >>>                        include_full_lookback = 'N',
    >>>                        fic_mis_date = '2023-12-21',
    >>>                        batch_run_id = 'SM_Aggregate_Base_Features_SB_2024-01-30_111111111111_1'
    >>>                        )
    

annual_model_validation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None, performance_metrics_list='Kappa Curve~F1 Curve~PR Curve~ROC Curve~Prediction Density~Confusion Matrix:Kappa~AUC Change~PSI', model_id_list='Deployed', from_date=None, to_date=None, printrs=None)
Parameters
  • model_group_name – Model group for which the validation has to be performed

  • model_group_scenario_name – Model group Scenario for which the validation has to be performed

  • fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD

  • performance_metrics_list

    List of performance metrics on which the Model has to be evaluated. Available Metrics:

    • Kappa Curve

    • F1 Curve

    • PR Curve

    • ROC Curve

    • Prediction Density

    • Confusion Matrix:Kappa

    • AUC Change

    • PSI

  • model_id_list – List of Model ids for which Model has to be evaluated.

  • printrs – (True/False) to print the output. If set to True, will print AUC Change and PSI.

Returns

Plot or metric depending the types of performance metric chosen

convert_to_sql_format(filters)

A utility that converts user-passed filter string to SQL formatted string

create_base_features(lookback=None, frequency=None, scenario_id_ls=None, is_investigated=None, date_range=None, date_range_osot=None, batch_run_id='Scoring Batch')

It fetches the base features from the table ml4aml_sm_base_features for the given model group, frequency, lookback, focus and date-ranges.

Parameters
  • lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

  • frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

  • scenario_id_ls – list of scenario’s ids. Ex: [116000031]

  • is_investigated

    (optional) Input describes the type of alerts returned.
    • ’INVESTIGATED’ : reviewed alerts

    • ’UNINVESTIGATED’ : un reviewed alerts

    • ’ALL’ : Both reviewed and unreviewed alerts

    Default is INVESTIGATED For training, always reviewed customers will be fetched.

  • date_range – From and To Date for in-time (Model Build) data set in YYYYMM format as numeric data type.

  • date_range_osot – From and To Date for OSOT Validation data set in YYYYMM format as numeric data type

  • batch_run_id – Batch run id needed during scoring.

Returns

dataframe

Examples:
>>> sm.create_base_features(lookback=30,
>>>                 frequency=7,
>>>                 scenario_id_ls=[116000031,116000079],
>>>                 date_range=[201605,201706],
>>>                 date_range_osot=[201706,201706])
>>>
create_definition(model_group_name=None, model_group_scenario_name=None, save_with_new_version=False, cleanup_results=False, version=None)

API creates unique definition using Model Group for a given Notebook. Internally called AIF create_definition method.

Parameters
  • save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.

  • cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.

  • version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.

Returns

Return successful message on completion, and proper error message on failure.

Examples:
>>> sm.create_definition( save_with_new_version = False,
>>>                        cleanup_results = False,
>>>                        version = None )
Definition creation successful...
True
create_modeling_dataset(X=None, key_var='ENTITY_ID', label_var='SAR_FLG', osot=False, is_validation=False)

This API converts any new AMLES data into modelling data by applying all the transformations recorded during training process for unsupervised.

Parameters
  • X – input pandas data frame.

  • key_var – Identity column. Default “ENTITY_ID”

  • label_var – target variable. Default ‘SAR_FLG’

  • osot – Boolean True/False. If running for out-time data (OSOT), it should be True.

  • is_validation – Boolean True/False. If running for validation use case, it should be True

Returns

dataframe

Example:
>>> sm.create_modeling_dataset(osot=True)
>>> sm.create_modeling_dataset(is_validation=True)
data_quality_data_preparation(model_group_name=None, model_name=None, focus=None, fic_mis_date=None)
Parameters
  • model_group_name – model group name which was created in ADMIN notebook

  • model_group_scenario_name – model name which describes the scenario. Ex: RMF, RMF_LRT

  • fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD

Object of the class DataQualityCheck from module ofs_auto_ml.utils.data_quality_checks.

get_base_feature_params(model_group_name=None, model_group_scenario_name=None)
Parameters
  • model_group_name – Model group for which the validation has to be performed

  • model_group_scenario_name – Model group Scenario for which the validation has to be performed

frequency, lookback, cutoff

get_base_features(osot=False, is_validation=False)

Get modeling data in current session

Parameters
  • osot – True for OSOT(out-time) dataset. Class variable sm.B_DF_OSOT will be set

  • is_validation – True for validation dataset. Class variable sm.B_DF_VAL will be set

Returns

modeling data as pandas data frame.

Example:
>>> aif.get_base_features(osot=True)
>>> aif.get_base_features(is_validation=True)
get_calendar_date_range(fic_mis_date=None, lookback=None)

It generates a date range for a given fic_mis_date and lookback.

Parameters
  • fic_mis_date – fic_mis_date

  • lookback – lookback period.

Returns

list of date ranges

Examples:
>>> sm.get_calendar_date_range(fic_mis_date = '2024-02-01', lookback=14)
get_modeling_dataset(osot=False, is_validation=False)

Get modeling data in current session

Parameters
  • osot – True for OSOT(out-time) dataset. Class variable sm.B_DF_OSOT will be set

  • is_validation – True for validation dataset. Class variable sm.B_DF_VAL will be set

Returns

pandas data frame.

Example:
>>> aif.get_modeling_dataset(osot=True)
>>> aif.get_modeling_dataset(is_validation=True)
get_non_behavioral_data(model_group_name=None, frequency=None, lookback=None, key_var='V_ENTITY_CD')

It retrieves the non-behavioral features for customer.

Parameters
  • model_group_name – (optional) model group name. Ex: SG_RETAIL_MM

  • lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

  • frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

  • key_var – (optional) Unique indentifier in customer table.

Returns

DataFrame

Examples:
>>> sm.get_non_behavioral_data()
get_scenario_features(X=None, key_var='ENTITY_ID', label_var='SAR_FLG')

It calculates the red flag features and appended with the base features dataframe.

Parameters
  • X – (optional) input dataframe.

  • key_var – (optional) Unique key Identifier in dataframe

  • label_var – (optional) Target Variable. Default is ‘SAR_FLG’

Returns

dataframe

Example:
>>> sm.get_scenario_features()
import_model_template(meta_data_df=None, model_name=None, overwrite=False)

This API will create the objectives in MMG by taking model group metadata as an input and also imports model drafts to respective objectives.

Parameters
  • meta_data_df – Same as the one created for adding model groups.

  • model_name – String which describes the scenarios. Ex: RMF, RMF_LRT

  • overwrite – If True Model Templates will be overwritten.

Returns

API response in json format.

Examples:
>>> sm.import_model_templates(meta_data_df=pdf)
merge(pandas_df_list, key_var='ENTITY_ID', label_var='SAR_FLG', overwrite=True, osot=False, is_validation=False)

Customized merge used for concatenation multiple dataframes for Scenario Modeling use case.

Parameters
  • pandas_df_list – list of dataframes to merge

  • key_var – Identity column. Default “ENTITY_ID”

  • label_var – target variable. Default ‘SAR_FLG’

  • overwrite – Boolean True/False. Overwrite existing column’s data if True.

  • osot – Boolean True/False. If running for out-time data (OSOT), it should be True.

  • is_validation – Boolean True/False. If running for validation use case, it should be True

Returns

resulted dataframe

monthly_model_validation(fic_mis_date=None, model_group_name=None, model_name=None, focus=None, model_id='Deployed', monitoring_technique=['CDBD', 'MD'], n_bins=9, N=5, No_SD=2)
Parameters
  • fic_mis_date – AAI FIC MIS Date used in the batch execution in the format YYYY-MM-DD

  • model_group_name – model group name which was created in ADMIN notebook

  • Ex (model_group_scenario_name model name which describes the scenario.) – RMF, RMF_LRT

  • model_id – List of Model ids for which Model has to be evaluated.

  • monitoring_technique – List of monitoring techniques. Confidence Distribution Batch Detection Method or Margin density

  • n_bins – No of bins to be used

  • N – No of bootstrap samples on which to estimate thresholds

  • No_SD – Threshold setting to be used

Output if drift observer or not

predict(X=None, key_column='ENTITY_ID', model_group_name=None, model_group_scenario_name=None, frequency=None, lookback=None, fic_mis_date=None, batch_run_id=None, threshold=None, return_score=False, debug=False, write_db_output=True, get_percentiles=True, get_deciles=True, btl_sample_count=0, n_top_contrib=None, unknown_data_score=- 999)

Test scoring interactively by connecting to production like schema before scheduling it as batch process in real production. Same sandbox can also be used for the scoring purpose. In this case sandbox schema should have scoring related input and output tables. All run time parameters expected during scoring batch should be set in studio paragraph for testing purpose.

Parameters
  • X – Stage 2 transformed new data as pandas data frame. default is None

  • key_column – Identity column

  • model_group_name – Name of the deployed model group.

  • model_group_scenario_name – Name of the deployed model group scenario. Always None for AMLES

  • fic_mis_date – AAI FIC MIS Date used in the batch execution.

  • batch_run_id – AAI Batch Run ID for the execution

  • threshold – Threshold to generate events for ECM. default 0.7

  • return_score – Boolean flag. If set to True scoring result is returned as pandas data frame to the caller. Default is False, and which is real production use case.

  • debug – Boolean(True/False). If set to True, debug mode is on

  • write_db_output – Boolean(True/False). If set to True, write to output table SM_EVENT_SCORE_DETAILS and SM_EVENT_SCORE

  • get_percentiles – Boolean(True/False). If set to True, percentile score will be calculated and returned

  • get_deciles – Boolean(True/False). If set to True, decile buckets will be calculated and returned

  • btl_sample_count – Number of BTL (below the threshold) random samples would be added with ATL. - if integer, take as many number of samples from BTL population and append with ATL. Ex: 10 - if fraction, take sample count from fraction of ATL population and collect as many samples from BTL population. Ex: 0.2

  • unknown_data_score – For missing categories, default score to be set. Default is -999

  • n_top_contrib – Int. Number of top contributing features. Ex: 5

Returns

Returns output scores as pandas data frame.

Examples:
>>> score_pdf_list = self.predict(X = Stage_2_OSOT_pdf,
>>>                 key_column = 'ENTITY_ID'
>>>                 model_group_name = 'SG_RETAIL_MM',
>>>                 model_group_scenario_name = 'RMF_CUSTOMER',
>>>                 fic_mis_date = date.today(),
>>>                 batch_run_id = 'RRF_ICC_BATCH_123',
>>>                 threshold = 0.5,
>>>                 return_score = True,
>>>                 debug = True,
>>>                 write_db_output=True,
>>>                 btl_sample_count = 10)
Returns output scores as pandas data frame
sar_extraction(mode='FILE', if_exists='OVERWRITE', ecm_datastore_name=None, processing_batch=None, from_date=None, to_date=None)
save_base_feat_params(model_group_name=None, model_name=None, focus=None, frequency=None, lookback=None, filter=None, threshold_cut_off=0.7)

insert batch parameters details into table ML4AML_SM_BASE_FEAT_PARAMS for reused during training and production.

Parameters
  • model_group_name – model group name which was created in ADMIN notebook

  • model_name – model name which describes the scenario. Ex: RMF, RMF_LRT

  • focus – entity type or segment name. Ex: CUSTOMER, ACCOUNT

  • frequency – frequency of run scenarios

  • lookback – lookback period for a scenario

  • filter – filter parameters which are to be applied on tables

  • threshold_cut_off – scoring threshold cut off value. Ex: 0.9

Returns

None

Examples:
>>> sm.insert_batch_params(model_group_name = 'GROUP1', model_name='RMF', focus='CUSTOMER',
>>> frequency=7, lookback=30,
>>> filter='PRIMARY_CUST_FL:Y~INCLUDE_B2B_TRNFR_FL:Y~INCLUDE_TRUSTED_TRANS_FL:Y~INCL_RLTD_PARTIES:Y~RPTNG_CURR_FL:N~MIN_HRG_RISK_LVL:10~INCL_SEC_PARTY_FL:Y~EFFCTV_RISK_CUTOFF_LVL:10~ACTVTY_RISK_CUTOFF_LVL:10~INCLD_ACCT_HLDR_TYP_CD:CR~MANTAS_BUSINESS_ACCT_TYPES:RBK|RBR~FUNC_CURR_FL:Y~INCL_WIRE_TRXN_PRDCT_TYPE_LST:EFT-ACH|EFT-TREASURY|EFT-FEDWIRE|EFT-SWIFT|EFT-OTHER|EST~INCL_MI_TRXN_PRDCT_TYPE_LST:CASH-EQ-CASHIER-CHECK|CASH-EQ-CERT-CHECK|CASH-EQ-MONEY-ORDER|CASH-EQ-TRAVELERS-CHECK|CASH-EQ-OTHER|CASH-LETTER|CHECK|PAPER-OTHER|CHECK-ACH~INCL_CASH_TRXN_PRDCT_TYPE_LST:DEBIT-CARD|SVC|CREDIT-CARD|CURRENCY|PHYS~INCL_BO_TRXN_PRDCT_TYPE_LST:JOURNAL~LRF_DIGITS:4~MIN_TRANS_AMT:10~MAX_TRANS_AMT:100000000~MIN_INDIVIDUAL_TRANS_AMT:10')
set_threshold_cut_off(threshold_cut_off=None)

set threshold cut off for scoring in table ML4AML_SM_BASE_FEAT_PARAMS

Parameters

threshold_cut_off – threshold cut off value

Returns

None

Examples:
>>> sm.set_threshold_cut_off(threshold_cut_off=0.8)
show_data_volume(model_group_name=None, model_group_scenario_name=None, focus=None, lookback=None, frequency=None)

It shows the month wise data volume count for the given scenarios. The aggregation is done on scenario run date.

Parameters
  • model_group_name – (optional) model group name. Ex: SG_RETAIL_MM

  • model_group_scenario_name – (optional) model group scenario name. Combination of model_name+’_’+focus. Ex: RMF_CUSTOMER

  • focus – (optional) entity type. Ex: CUSTOMER or ACCOUNT

  • lookback – (optional) lookback period. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

  • frequency – (optional) run frequency. If None, it will be taken from table ML4AML_SM_BASE_FEAT_PARAMS

Returns

DataFrame showing the data count for each month

Examples:
>>> sm.show_data_volume()
show_red_flag_features(scenario_id_ls=None)

It shows all available out of box red flag features configured for a scenario.

Parameters

scenario_id_ls – list of scenarios.

Returns

dataframe showing available red flags

Example:
>>> sm.show_red_flag_features(scenario_id_ls='116000079')
show_scenarios()

This API shows available scenarios along with unique scenario Id

Param

None

Returns

return dataframe

Example:
>>> sm.show_scenarios()

Module contents