ofs_aif.sanctions package¶
Submodules¶
ofs_aif.sanctions.birth_year_similarity module¶
- class birthyearSimilarity(year1, year2=1970)¶
Bases:
object- ages()¶
- digitSimilarity(digit_sim= 0 1 2 3 4 5 6 7 8 9 0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 5 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 6 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0)¶
- exactMatch(flip_year=False)¶
- timeDistance()¶
- year_length = 365.2422¶
ofs_aif.sanctions.edq module¶
ofs_aif.sanctions.event_scoring module¶
- class eventScoring(connect_with_default_workspace=True)¶
Bases:
supervisedThis class
eventScoringis a special use case of supervised learning for anti-money laundering for sanctions event scoring (AMLSES)- create_definition(save_with_new_version=False, cleanup_results=False, version=None)¶
API creates unique definition using Model Group and Model Group Scenario ( optional ) for a given Notebook.
- Parameters:
model_group_scenario_name – Name of the Model Group , as per it created in AIF-Admin notebook.
save_with_new_version – Boolean flag with options True/False. It helps creating history of models/outputs for a given definition. Any version can be chosen at later point in time.
cleanup_results – Boolean flag with options True/False. When set to True, deletes all the outputs due to previous executions.
version – when multiple versions of the definitions are created, version is supplied to pick the required version of the definition. Default value is None means MAX version of the definition.
- Returns:
Return successful message on completion, and proper error message on failure.
- Examples:
>>> aif.create_definition( model_group_name = "CORPORATE AND INSTITUTIONAL" , >>> model_group_scenario_name = "SHELL", >>> save_with_new_version = False, >>> cleanup_results = False, >>> version = None ) Definition creation successful... True
- create_evented_data(date_range=None, osot_date_range=None)¶
This API prepares sanctions evented data using python API, create_evented_data from table ml4aml_sanctions_events and stores them in AIF class members self.B_DF for in-time and self.B_DF_OSOT for out-time (osot). DATA_SOURCE and BUSINESS_CENTRE make a unique filter to get data from table.
- Parameters:
date_range – From and To Date for OSIT ( Model Build ) data set in YYYYMMDD format or YYYYMMDD as numeric data type.
Example: date_range = [20150101, 20151231]osot_date_range – From and To Date for OSOT Validation data set in YYYYMMDD format as numeric data type
Example: osot_date_range = [20160101, 20160331]
- Returns:
osit and osot data is stored in class members for further references.
- Example:
>>> aif.create_evented_data( date_range=[20150101,20151231], osot_date_range=[20160101,20160331]) Data preparation ( Sanctions events ) successful... True
- create_modeling_dataset(X=None, osot=False)¶
This API converts any new Sanctions data into modelling data by applying all the transformations recorded during training process for unsupervised.
- Parameters:
X – Sanctions input data as pandas data frame.
osot – Boolean flag to indicate data set type ( in-time or out-time (OSOT) ). Set to False always while prediction.
- Returns:
Sanctions stage 2 created data is saved inside the class object.
- Example:
>>> aif.create_modeling_dataset( X )
- get_event_score_summary(jurisdiction=None, business_domain=None, fic_mis_date=None)¶
- get_evented_data(osot=False)¶
Get sanctions based events
in-time(osit)orout-time(osot)data as pandas data frame.- Parameters:
osot – Boolean flag to indicate data set type ( in-time or out-time (OSOT) ). False : (default) For in-time data set. ( Model build dataset ) True : For OSOT dataset.
- Returns:
osit/osot data as pandas data frame.
- Example:
>>> B_OSIT_PDF = self.get_evented_data(); Data dimension : 41544 x 8 >>> B_OSOT_PDF = self.get_evented_data(osot = True); OSOT dataset is None
- import_model_template(jurisdiction=None, business_domain=None, overwrite=False)¶
This API will create the objectives in Complaince Studio by taking jurisdiction & business domain as an input and also imports model drafts to respective objectives.
- Parameters:
jurisdiction – Jurisdiction for an event segment
business_domain – Business domain for an event segment
overwrite – If True Model Templates will be overwritten.
- Returns:
On successfull execution, imports model template notebooks into respective objectives/folders.
- Examples:
>>> aif.import_model_template(jurisdiction = 'North America', business_domain = "United States of America", overwrite = False)
- predict(X=None, date_range=None, key_column='EVENT_ID', fic_mis_date=None, batch_run_id=None, threshold=0.7, return_score=False, debug=False)¶
Test scoring interactively by connecting to production like schema before scheduling it as batch process in real production. Same sandbox can also be used for the scoring purpose. In this case sandbox schema should have scoring related input and output tables. All run time parameters expected during scoring batch should be set in studio paragraph for testing purpose.
- Parameters:
X – Stage 2 transformed new data as pandas data frame. default is None
key_column – Identity column
date_range – Scoring date range as python list
fic_mis_date – AAI FIC MIS Date used in the batch execution.
batch_run_id – AAI Batch Run ID for the execution
threshold – Threshold to generate events for ECM. default 0.7
return_score – Boolean flag. If set to True scoring result is returned as panadas data frame to the caller. Default is False, and which is real production use case.
debug – Boolean(True/False). If set to True, debug mode is on
- Returns:
Returns output scores as pandas data frame.
- Examples:
>>> score_pdf_list = self.predict(X = Stage_2_OSOT_pdf, >>> key_column = 'ENTITY_ID' >>> date_range = ['',''] >>> fic_mis_date = date.today(), >>> batch_run_id = 'RRF_ICC_BATCH_123', >>> threshold = 0.5, >>> return_score = True, >>> debug = True ) Returns output scores as pandas data frame
- set_edq_url(edq_url)¶
- set_edq_user_password(edq_user_password)¶
- set_event_segments(jurisdiction=None, business_domain=None)¶
- show_event_segments()¶
View Available segments for sanction based events Displays all unique combinations of DATA_SOURCE & BUSINESS_CENTRE in ML4AML_SANCTIONS_EVENTS
- Returns:
pandas dataframe show possible event segments.
- Example:
>>> aif.show_event_segments()
- update_edq_events(event_score_df=None, parallel=True)¶
- update_event_score(jurisdiction=None, business_domain=None, fic_mis_date=None)¶
ofs_aif.sanctions.string_similarity module¶
- class stringSimilarity(string1, string2, r_map={})¶
Bases:
object- editDistance(swap=False, rm_vowels=False, rm_repeated=False)¶
- histogramSimilarity(rm_vowels=False, rm_repeated=False)¶
- longestCommonSubstr(swap=False, rm_vowels=False, rm_repeated=False)¶
- phoneticEditDistance(swap=False, rm_vowels=False, rm_repeated=False)¶
- vowels = ['a', 'e', 'i', 'o', 'u']¶
ofs_aif.sanctions.transformation module¶
- class matchSimilarity¶
Bases:
BaseEstimator,TransformerMixin- For each type of time-series variable calculates 3 jump values
NORM: (current_month ??? avg (prev_12_months)) / avg (prev_12_months) LM: (current_month ??? last_month) / last_month SMLY: (current_month ??? same_month_last_yr) / same_mth_last_yr
Apply the user configured ???%over??? thresholds (default is 200%). Denote a violation as ???1???, no violation as ???0???, not enough data for calculation ???_???, concatenate the result into a 3-digit bit map
- Args:
threshold_percentage ([list]): Cut-off percentage for denoting a violation. Multiple jump variables for the same type of base variable can be created by using different thresholds passed as a list
- aggregateInfo(x, old_field, new_field)¶
- editDistance(x, rm_vowels, rm_repeated)¶
- exactMatch(x, flip_year)¶
- fit(X, key_var='ENTITY_ID', target_var='SAR_FLG', feature_include=None, feature_exclude=None)¶
Fit Jump bitmap transformer
- Args:
X ([DataFrame]): Input time-series dataframe key_var (str, optional): Key variable for grouping time-series data. Defaults to “ENTITY_ID”. target_var (str, optional): Variable indicating the target. feature_include ([list], optional): List of features to be included in bitmap computation. Defaults to None. feature_exclude ([list], optional): List of features to be excluded from bitmap computation. Defaults to None.
- Returns:
Self: Fitted transformer
- genderMatch(x)¶
- histogramSimilarity(x, rm_vowels, rm_repeated)¶
- longestCommonSubstr(x, rm_vowels, rm_repeated)¶
- name_mapping()¶
- occupationAge(x)¶
- timeDistance(x)¶
- transform(X)¶
Transform time-series features within X to 3-char jump bitmaps
- Args:
X ([DataFrame]): Input time-series dataframe
- Returns:
[DataFrame] – Contains jump bitmap features for all the time-series features